AI in agriculture offers numerous benefits and plays a crucial role in addressing the challenges of feeding a growing global population while minimizing environmental impact. Joining me today is Praveen Pankajakshan, Vice President of Data Science and AI at Cropin, to talk about intelligent agriculture and how Cropin is paving the way forward for sustainable agricultural practices. Cropin is a technology company that offers services and solutions for the agriculture industry, including AI/ML models, data processing, and applications to digitize farm operations and enable data-driven decision-making.
In our conversation, Praveen discusses various aspects of how machine learning and AI are being applied to agriculture to improve farming practices, sustainability, and climate resilience. Discover how Cropin employs AI to identify crops, monitor crop health, and provide timely advice to farmers on planting and harvest timings. He highlights the importance of combining satellite data with ground-level insights and the rigorous data annotation process, emphasizing the significance of field visits. We also delve into crop-cutting experiments for machine learning, overcoming out-of-distribution (OOD) problems, how climate change makes training models difficult, and much more! Tune in and discover how Cropin is revolutionizing farming and sustainable agriculture with Praveen Pankajakshan!
- Praveen's background and how he got into agriculture and machine learning.
- Cropin's mission and its digitization and monitoring services for farmers.
- Discover the role of machine learning in enhancing agricultural tasks.
- Learn about the types of data Cropin leverages for crop digitization.
- Why ground data and field visits are essential for the validation process.
- Insights into the challenges of working with agriculture data.
- Developing and deploying machine learning products for agriculture.
- Maintaining machine learning advancements around seasons.
- Agritech innovations that Praveen finds the most interesting.
- Words of advice for leaders of AI-powered startups: stay grounded.
- The future impact of Cropin on sustainable agricultural practices.
“There are many areas where machine learning has actually worked wonders. And I would say that because we have been digitizing farmlands now for over a decade.” — Praveen Pankajakshan
“One of the major challenges of working with satellite data is it definitely needs ground data [for validation].” — Praveen Pankajakshan
“Agriculture is very complex, and it's also very nice to work with because it's also profoundly impactful.” — Praveen Pankajakshan
“[In terms of development], we have to ensure that first we have some baseline models ready for deployment, for inferencing. And development happens almost simultaneously.” — Praveen Pankajakshan
“[I] insist more on data quality rather than the quantity of the data.” — Praveen Pankajakshan
Praveen Pankajakshan on LinkedIn
Praveen Pankajakshan Email
LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.
Computer Vision Advisory Services – Monthly advisory services to help you strategically plan your CV/ML capabilities, reduce the trial-and-error of model development, and get to market faster.
[00:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine-learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people in planetary health. You can sign up at pixelscientia.com/newsletter.
[0:00:35] HC: Today I’m joined by guest Praveen Pankajakshan, Vice President of Data Science and AI at Cropin, to talk about intelligent agriculture. Praveen, welcome to the show.
[0:00:45] PP: Thank you, Heather. Thanks for hosting us.
[0:00:48] HC: Praveen, could you share a bit about your background and how that led you to Cropin?
[0:00:52] PP: Sure. I have a background in electrical engineering. My primary undergraduate degree and also my master’s. Both were in electrical engineering. And then I later on decided to do my Ph.D. in computer science and applied mathematics from INRIA Computers in France.
But over a period of many years, I’ve been dabbling in multiple applications of computer vision, and machine learning into utilities, into energy sector, healthcare, and biosciences. But a few years back, I was working with some family members. And they had some farm and they were close to 30 years they’ve been working in organic farming.
Slowly, I got interested in agriculture. And moving into India, I actually bought my own farm. And what started actually as a hobby, organic farming, and myself being a farmer. Then I was kind of starting to think about how I can also start using some of the recent advancements in technology in agriculture. Especially, I was more interested in sustainable agriculture. A few years back, I got this opportunity of leading a machine learning team based off of Hyderabad in an organization called Corteva Agriscience, which is the DowDuPont agriculture division.
And from there on – at that time I was working mostly in drones and its applications. And also, a little bit around genetics. But then I thought that I wanted much more focus on areas which are directly linked to farmers. And that’s when I got this opportunity of working with Cropin. I started as the VP of data science and AI. I started my lab here working on the application of satellite imaging. And that led me to where I am today.
[0:03:02] HC: What does Cropin do? And why is this important for agriculture and fighting climate change?
[0:03:08] PP: Cropin has been in existence now for more than a decade. And it started originally as a farm digitization application. This was even before agritech was actually a popular word. And we started off with small farmers. Working with them. But very quickly, we realized that agriculture not only in this part of the world but in many other regions, is very unstructured and it’s fragmented also. And the whole idea of digitization became a big piece, and a central piece as well, and rightly so.
We were one of the first to actually enable farmers to digitize farmlands. So, farmers could actually walk around their fields and geotech and geofence their farms. And sometimes they don’t have actually directly available. We enable them to do that. And once that this geotech comes into our system. And then, it’s possible for farmers to monitor these farms, right? Not only farmers, but those who support them, villages, might be some corporates or enterprises who are into contract farming. Or development agencies who are working with many, many farmers and ensuring that they are actually climate resilient or working towards sustainability, right? In all of these cases, farm digitization becomes important. That’s the first thing that actually Cropin enables.
The other thing is once it’s actually digitized, then the geo boundaries come into our system and then we are able to download the satellite images corresponding to that location overlaid on top of these geo boundaries. And then monitor those farms. You can monitor the crop health, the right time of sowing, and the harvesting. Both biotic or abiotic stresses. Any weather-related, you can overlay weather intelligence on top of it. And provide information also on what kind of possible pests and diseases that can happen in that location, right?
Now, as you can imagine, that the current scenario, what is happening, Heather, is that many of these farmers are directly affected by climate change, right? And this is actually something which is really happening even as we speak. For example, in India, this season, the rainfall was delayed. And now we are actually seeing, even though there was rain which picked up, but now we are again seeing a loss in rainfall.
This is actually a realistic scenario the farmers are telling in front of them. And so, it becomes very crucial for them to have like rightly time of advice. And not only that, but sometimes farmers are overloaded with information. But what is the right information that we have to provide at the right time? It’s very critical, you know?
Should they actually like wait for the next rain? Should they sow it at this period of time? These are very important, critical decisions for farmers and actually like really affect their yield and outcome, right?
Even in mid-season, right? Last season, for example, which was the winter season for us, we saw that suddenly the temperatures were increasing in certain regions. And it’s much higher than the long-term normals as we call it, 20-year, 30-year normal. Sudden peaks in temperature. Day temperatures and difference between the day and night temperature is very less.
Under those circumstances, the soil heats up. We saw that potato farmers were having very severe – the potatoes that were coming were having heat necrosis, you know? It could have been avoided, for example, if the timely spraying of water had been done to cool down the soil. These are very important advices, which we can provide, right? Very simple one, but a very effective one also.
[0:07:19] HC: What role does machine learning play in this technology?
[0:07:22] PP: There are many areas where actually machine learning has actually worked wonders. And I would say that because we have been digitizing farmlands now for over a decade. We have crop information. We have geo boundaries of many of these farms, which are there. And we can actually anonymize it. And now, diversity does it from across different countries. Many different varieties of crops and varieties of seed genetics also, right?
Now, all of this information is there. And so, we can use this actually to train our models and to calibrate them, to fine-tune them. The major areas in which we find is like, firstly, when we look at satellite images, right? If you look at large swaths of land, one of the most challenging problems that we found is that it’s very difficult to quantify where exactly agricultural lands are. Sometimes, during a particular season, the farm might be kept barren, right? It’s very important to detect regions where the sowing has happened, and where the crops are growing, right?
Now it’s very challenging. It’s not as easy as it seems. Because sometimes you might have shrubs growing. You might have grass is growing. Trees are some horticultural. Trees are like also kind of – you can classify them as agriculture really as well.
We identify that. And that is actually done using machine learning as well. We identify those regions. And then on top of it, we identify regions where crops are growing, and specific type of crops, right? There we can actually take a decision. And crop identification is also a challenging problem. It comes from computer vision as well. Because you now are looking at time series data. You’re looking at both spectral and spatial data, right?
These are very challenging and open problems, which it’s in large regions, which we are solving for. And we have like models which are in production. I think it’s always an end-to-end, right? Right now, it’s not just in one particular area that we foresee even.
If we are looking at – there are satellite images, we have clouds. And sometimes during the monsoon season, during the wet season, you can see that between the satellite and the Earth, there are clouds. You can’t see the factual signature of your crops is not visible, right? How do you account for that? How do you take care of that?
At plot level, we are able to handle it, so that we can actually see these vegetative indices which are coming from the satellite to see how the crop is doing, right? We combine multiple different satellites itself. We combine two different satellite modalities, both radar and optical data to kind of bring that clarity, right? There are many other areas. I think this is just scratching the surface, I may say, in terms of examples which I’ve cited.
[0:10:24] HC: You mentioned satellite imagery. Do you work with other types of data? Maybe data that’s collected on the farm that’s combined with it? Or is it satellite imagery really the core of what you do?
[0:10:34] PP: Yeah. We do have ground information as well. When we geofence and geotag these farms, so that we have like some custom farms which are generated. And from those farms, we actually have further information, like, when is the sowing date done? The sowing that happened. When is the harvesting date? If it is past historical data that we are looking at. What type of plant was actually sown? A crop was sown?
If they had any diseases? They take photographs of these crops. And if there are any diseases, that’s uploaded, right? Then we have information regarding any management practices. Inputs which they had, like fertilizers, pesticide applications, if any. These are all also uploaded.
Many other farms, for example, who are interested in water conservation, right? They do provide an idea of how much water inputs have gone into. We do the irrigation scheduling for them, you know?
All these inputs actually come into us. Sometimes if there is any IoT devices like soil moisture sensors, right? Sensor information comes in. Many potato growers, for example, they do – for them, soil moisture is a big thing, right? As I mentioned before. Then that data is there. It’s a combination of all of this, Heather.
[0:11:55] HC: Do you need to annotate all this data that’s coming in? Just the data from the ground, but the satellite images? Or does it depend on the model?
[0:12:01] PP: Yeah, that’s a great question. Actually, one of the major challenges of working with satellite data is it definitely needs ground data. And when we actually talk about annotation, it’s not strictly in the sense of computer vision. It’s not like somebody segments a particular area or labels it as a particular class, right?
In this particular case, there needs to be a field visit. That’s a major challenging problem. And that’s why the amount of data that’s available for in agriculture is very limited. Even globally as well, right?
When we talk about annotation, it’s actually field visits. And these field visits have to be done within the season. And sometimes multiple times within the season. There has to be a visit. Sometimes what happens is, if you might impasse data, if you ask a farmer in a particular location, they might say that, “Yeah, I remember. A few years back, I grew corn in this area or I grew wheat in this area.” But there is no proof that he or she did that, right? You need actually to visit it during the season to collect all that data. That’s one of the challenges of it.
Here, when we talk about annotation, we literally mean going into the field and collecting that. Being in the field. And at that particular location you’re tagging, that it is indeed made. Take a photograph of it to confirm it.
Not only that, we also do something called crop-cutting experiments. I talked about machine learning. We do estimate that, within the season or towards the end of the season, we do yield estimates. How much yield is coming from the farm?
In which case, there is something called crop-cutting experiments which happen. Going into the field, somebody actually marks out a particular region of the field and then they take samples of it and they weigh the dry and the wet biomass. And that gives you an idea of how much of yield you can expect. We correlate that also with the yield estimates that we’ll get from satellite data. You see, it’s quite intense in terms of annotation, if I may say so.
[0:14:18] HC: What are some of the challenges you encounter in working with this data and with imagery? You mentioned a few things related to the clouds and the satellite imagery and the way you need to go about getting ground truth with a site visit. Are there other common challenges that you deal with?
[0:14:34] PP: Right. Agriculture is very complex, and it’s also very nice to work with it because it’s also profoundly impactful as well. And there are a lot of problems which we have solved, like I mentioned before. But there are still some open problems.
Problems which we have solved is partially like not completely, but we have solved for the cloud-related issues, right? there is a way for us. And I’ll talk about that later. But one of the major challenges of agriculture data is in computer vision and in machine learning what we call is out-of-distribution (OOD), right?
If we train our models on one particular set of data in one particular region, scaling up to other regions is a big challenge, you know? There would be classes, which you have not encountered for. There would be samples of a particular class which you have never seen before. These are things which has to be taken into account. And the diversity of the crop.
You’re always going to find, there are many crops which I’ve never seen before, right? A new region we moved to, there are new crops. How do we deal with that kind of ambiguity? There would be minimum number of samples. Even if you have seen it in that particular crop, there will be minimum number of samples, right?
The diversity is humongous and working with agriculture data. And the amount of data that’s available in the public domain is very limited, right? Knowledge also is – if you look at it in terms of what is available in the public domain, there is a lot of ambiguity in it or this conflict, right?
Somebody says that the sowing starts at this period of time. But somebody else says that the sewing starts at a later period of time. What do you take as ground truth, right?
And there are reasons for it. Because, as you can imagine, with climate change, with the global warming that is happening, the seasons are changing. Between one season, which was before, is now getting delayed. The next season either starts or it’s also delayed. Either there is an early start or there is a late start, right? All of this means that the earlier knowledge that we accumulated is also no longer valid, if I may say so. Or need improvements in it or needs ramifications of changes, right? That is something that we have to accept and we have to work with and deal with while we are working with agriculture data. And knowledge also, if I may say so.
[0:17:09] HC: How does your team plan and develop a new machine learning product or feature? In particular, what kind of steps to go through at the beginning of the process?
[0:17:18] PP: Right. That’s a great question. We have adopted something called as a technology readiness level. And we have also the product readiness level. The TRL levels, the technology readiness levels, they actually follow through totally like five or six steps. And beyond that, they actually go into product.
What we do, in each of these steps, they are very clearly defined. And what we look at is initially when either there is a new client comes in or a new idea that comes in, it starts with the TRL level zero. We do the initial screening and we look at the public literature that’s available in that subject and everything. And so, we built our initial proof of concept (POC) and we test it out with a few clients.
Now the way that we actually extend that is that we try to see if more clients are interested and we try to scale them up. There are two ways in which we scale it. One is looking at in terms of the number of farmers that we cater to, right? Now the number of customers that we catered to. The other way of scaling is moving into new crops and new geographies.
One particular machine learning model, it’s not enough if it actually kicked us to, let’s say, India, or in Africa, or anything. We should have the capability of testing it in other locations as well.
The yield model, for example. It’s tested in over 20 countries now and more maybe. We have like made it very, very robust and is now in production. And it’s been scaled up. And all of it works in the cloud right now. Here, I’m talking about the computing cloud, you know?
Everything works on the cloud. We are able to bring it to a certain level of automation so that any farmer wants registered, now the monitoring happens at all levels. Whatever they would like to see from their farm.
That’s the different levels that we actually take each of the product or feature through and before it’s transferred to production. And then, it’s actually scales up from there. Then it’s just a question of like just the farmer registering it and then the features being readily available for them as soon as the satellite images are available, right?
It’s only contingent on the availability of the satellite image. The next time there is a satellite overpass, and once the satellite image is downloaded, then it’s ready for them at their desktop or directly on their application.
[0:19:55] HC: How does the seasonal nature of agriculture affect machine learning development? For example, do you focus on certain activities during the growing season and others in the off-season?
[0:20:05] PP: Yeah. That’s also a very good question. One of the important parameters as I mentioned also, because it’s very seasonal in nature, we have to actually plan it absolutely well. Especially data collection, right? It’s very important that if we miss the season, then it’s very difficult to capture it.
All our, for example, in field data collection, everything have to be absolutely synced up with the field team. We have to pre-train them on how to use our own systems for collection and then ensure that we do the quality checks on the data that is collected. All of that is a very, very stringent, rigorous, and very well-planned.
That has happened instances where because of one reason the other, during the – for example, the COVID time. It was a challenging period for us to actually collect data. That has to be a really well-planned out, data collection, for example.
And if you missed that, you have to understand that we have to wait until the next year of the wet season for us to get wet season data. Dry season data, you get it immediately after the wet season, right? A year is your wait.
Now, the other thing is geographies that you’re interested in, right? All of this has to be pre-planned. Not only that. One thing is, if it is a growing season during one period of time – because we focus also globally. Some other regions, it’s already harvesting has started, right? Or some people are preparing for sowing. In some other locations, it might be mid-season, you know? It’s actually like throughout the year that we have to pay attention in different geographies. It’s very, very country-specific, very geography–local, region-specific.
Now that’s mostly about data collection and planning, field activity planning, field data collection, and field staff resourcing. And about the deployment also, development actually happens throughout the year. But deployment is something that we have to be very also on our toes all the time. Because it can – sometimes your requirement from a particular region, or a farmer, or set of farmers, or clients, or development agencies can almost like come very quickly. We have to keep our models ready.
If there is a version one of the models, we keep a baseline model ready for deployment. And we are working on the version two of it and already trying to improve that. It happens. And only way to handle that is by ensuring that we have these running in the cloud. So that as soon as they’re ready for the season, sometimes it just comes like just before the season is about to start and we have to be ready for that. Development, we have to ensure that first we have some baseline models ready for deployment, for inferencing. And development happens almost simultaneously.
[0:23:07] HC: Machine learning is advancing quite rapidly right now. There are new advancements hitting the headlines more frequently than ever before. Are there any new developments in computer vision or AI more broadly that you’re particularly excited about and perhaps can see a potential use case for Cropin?
[0:23:24] PP: Having been in this field for two decades, I still feel that there is a lot for me to learn. And it’s exciting to see all these new models, new ideas which are coming out. There are some trends, there are excitements, there are a lot of focus on you know certain models. And I’m specifically talking about either vision foundation models or language foundation models, right?
We have been more focused and centered around like what are those which are fundamentally important for actually bringing a difference to farmers and to the world also. One of the major areas that we are focused on is – and which kind of happened instantly is, like I was mentioning, is having these problems with satellite data. Cloudy time periods.
And you can see that within a time series of images, you see that there are clouds appearing. Now we can actually use a radar. A radar actually can penetrate into the clouds and you can get some visibility. But you are actually not looking at the reflectance, but you’re looking at the backscatter questions, right?
How do you combine these was one of the problems that we are trying to solve? And we use conditional GAMs to solve that, right? And over a period of time, we realized that solving that and generating cloud images meant that we were sitting on top of one essential model, which could be called a foundation model, which could also be used for solving downstream tasks. Like, identifying, like I mentioned, which areas are agriculture lands? To actually detect maybe like changes due to forest fires. How much of a burnt area is there?
These are actually like really extreme and exciting downstream tasks, which not only [inaudible 0:25:19] trying to reconstruct images, but even downstream tasks could also be accomplished with that. Now that’s one.
Actually, I would say not an original intent, but it just came out as a side potential for Cropin. And this is one area which we are really interested in exploring and diving deep as well. As these models, as vision models, you can see that foundation models become more and more broader to solve for — kind of become independent and represent the broad space of satellite images and what downstream tasks it can solve for. I think we are also looking and excited to solve that.
The other thing, I think it’s not AI per se, but definitely more and more consumers are actually moving towards ensuring that their food is sustainable, right? This is one thing which is really close to my heart, is that with the current land systems, that we have soil, that we have – we can’t actually we keep putting stresses on our soil and also on our water system. And it is primarily very important for us to shift towards sustainable and regenerative agriculture practices.
I’m also excited we are actually launching a center of excellence within Cropin for that and for ensuring that we produce food also sustainably and help people do that. That’s one exciting area which we foresee, which will also really push the boundaries on what we can do with agriculture for a growing population, but also for a population which is becoming conscious, climate-conscious and also conscious on what they eat also and what they consume.
[0:27:13] HC: That sounds like a great new development. Is there any advice you could offer to other leaders of AI-powered startups?
[0:27:21] PP: Yeah, this is just much more looking at the space. And, obviously, it can be quite intimidating for a lot of people who are starting at this period of time. And I would say that, for us, what has helped, and maybe this might actually help everyone, is to really stay grounded and just start with focus on the business models that you are trying to build. And not be taken in too much by, for example, the trends that’s happening in this space.
Not all of them might actually have an immediate business value. If we are able to focus on the fundamentals and build simple models and demonstrate, get the buy-in from the customers, get their confidence, get their trust. And that is critically more important than building complex models, right?
I would say that sometimes many of the cases that we have solved, sometimes it’s very simple ideas, which is actually taking us very far to our relationship with our clients and customers. The other thing is to have a very good strategy for data. That’s primarily important. Because at the end of the day, that’s – while AI systems are going to get more generic, fine-tuning of them will definitely be imperative.
And even if you do have a small data set, but if that data set is very well-curated, if it’s clean and you have complete trust in that data, it’s fair. Then I think that goes a long way rather than building a huge database, and huge data platforms, which consumes not only like a lot of computational storage. But it can also be very difficult to manage, right? I would say start small and scale it up, and insist more on data quality rather than the quantity of the data, I would say.
[0:29:20] HC: And finally, where do you see the impact of Cropin in three to five years?
[0:29:24] PP: It’s a tough question and I’ll try to answer it as best as I can. What we would like to do as Cropin is we have launched the first Cropin cloud platform. And this is for you know different players in the market to come and use the services of Cropin at this moment. This was our first launch. And it’s the first agriculture cloud platform.
But as I mentioned, one of the things that we are kind of excited is to enable as many farmers as we can. And wherever there is cultivable land in the planet, we would like to enable monitoring those lands. And what is closer to my heart is to actually like service these small land-holding farmers and not inundate them with information, but provide them with information, which is very, very relevant.
At this period of time, they are all struggling with quality information. And it’s very, very important for us to curate, internalize and make that ready. And what we would like to do within Cropin to do that is ensure that not only data, but the knowledge and everything that we curate is absolutely accurate, so that we can actually cater to the growing farmer population and also ensure food production is actually sustainable.
And the other important thing is, of course, an extension of that, I mean, no farmer would like to actually add chemicals or things, which can actually deteriorate not only their soil but also the kind of produce that they are generating. It’s very critical for us to also enable farmers to make that transition into sustainable production, right? And they are really struggling with it. And at the same time, make it profitable for them.
It is a win-win for everyone. And we would like to enable that for the farmer, for the planet, and also for the consumers who are involved also in this. That’s something which is more closer to my heart that we can achieve in the three to five years framework, Heather.
[0:31:36] HC: This has been great. Praveen, you and your team at Cropin are doing some great work for agriculture. I expect that the insights we’ve shared will be valuable to other AI companies. Where can people find out more about you online?
[0:31:48] PP: We are available in cropin.com. You can visit our website or write directly to me at [email protected] and I’ll be happy to answer the questions, either they have with respect to agriculture or anything that they want to help in this domain. After all, we can’t do this alone. And we are always happy to collaborate and also help those who want to really bring a difference to both agriculture and the planet.
[0:32:17] HC: Perfect. Thanks for joining me today.
[0:32:19] PP: Thank you so much, Heather, for inviting us. Thank you so much.
[0:32:22] HC: All right. Everyone, thanks for listening. I’m Heather Couture. And I hope you join me again next time for Impact AI.
[0:32:33] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend, and if you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.