Marie Coffin is the Vice President of Science and Modeling at CIBO Technologies, and she is with me today to discuss regenerative agriculture. Join us as we explore CIBO’s work to influence company carbon footprints across industries, and how machine learning supports this process through remote sensing. Delving deeper, Marie unpacks how satellite imagery integrates with their computer vision system for a more scalable solution. Next, we discuss obtaining and categorizing data in the US, exploring some of the obstacles that stem from privacy and data protection concerns. We touch on data quality and discuss the reason behind the geographical parameters they have applied to the work before Marie shares her approach to collaborating with external experts and agronomists. She offers her advice for startups in the tech space, emphasizing creating value for your clients over keeping up with trends, predicts the future endeavors that CIBO will focus on, and more. Thanks for listening! 


Key Points:
  • Introducing Marie Coffin and her background leading up to her role at CIBO Technologies.
  • CIBO’s work to influence company carbon footprints to improve agricultural sustainability.
  • The role of machine learning in this process: remote sensing.
  • What remote sensing is used for at CIBO.
  • How satellite imagery interacts with their computer vision system.
  • Gathering, labeling, and annotating data with a focus on the boundary of the field.
  • Obtaining this information through a farmer’s recording process.
  • Why their work is largely limited to the US at the moment.
  • Challenges related to privacy and data protection while working with training models.
  • Managing data quality issues.
  • Validating models within a geographical context.
  • Collaborating with domain experts and external agronomists to understand and validate thier approaches.
  • How the seasonal nature of agriculture impacts the timing of reports and outputs.
  • Advice for tech startups; addressing trends, who to hire, and more.
  • Qualities Marie seeks in new hires.
  • Her prediction for CIBO’s growing impact in the next three to five years.

Quotes:

“It’s pretty straightforward to estimate the carbon footprint of a single farmer’s field or even the carbon footprint of a whole farm, but, to make an impact, we need to be able to scale that across the landscape.” — Marie Coffin

“That is really the biggest challenge; it’s just getting enough data.” — Marie Coffin

“When you’re working in a really cutting-edge area, it’s tempting to sort of get caught up in the buzz of the new technology and lose sight of what the customer needs.” — Marie Coffin

“We need to not always be following the latest, greatest advance. We need to be going in a direction that’s going to really provide value.” — Marie Coffin


Links:

CIBO Technologies
Marie Coffin on LinkedIn


Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.


Transcript:

[INTRODUCTION]

[00:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine-learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people in planetary health. You can sign up at pixelscientia.com/newsletter.

[INTERVIEW]

[0:00:33.8] HC: Today, I’m joined by guest Marie Coffin, Vice President of Science and Modeling at CIBO Technologies, to talk about regenerative agriculture. Marie, welcome to the show.

[0:00:44.0] MC: Thank you, thank you for inviting me, I’m excited to be here.

[0:00:45.9] HC: Marie, could you share a bit about your background and how that led you to CIBO technologies?

[0:00:49.9] MC: Sure, starting way back, my degrees are in mathematics and statistics and I’ve worked for a long time in agriculture, mostly doing modeling and data science and statistics and I started with CIBO in 2016. I was working for a large agribusiness company at that time and I’ve been in the same role for quite a while and I was looking for something new and different and CIBO really excited me, really ticked a lot of boxes for me. The work was really cutting-edge.

I got to work with really great people, I still work with really great people, really mission-driven people not only good at their jobs but people who were really driven to bring agriculture fully into the digital age, and so that was our original mission and it still is.

[0:01:29.8] HC: So, what does CIBO do and why is it important for agriculture?

[0:01:33.6] MC: Yeah. So, CIBO provides modeling and accounting and verification services to improve agricultural sustainability, that’s a mouthful, and at the present time, there’s a lot of focus on agricultural carbon, and we enable farmers to earn carbon credits for adopting more sustainable practices and we also enable companies that have agriculture in their supply chain to account for and influence the carbon footprint of that part of their supply chain.

So, that could be a consumer goods company, you know, making food or clothing, it could be a biofuels manufacturer, it could even be a financial institution that wants to look at the carbon footprint of their investment.

[0:02:11.9] HC: And what role does machine learning play in this technology?

[0:02:14.0] MC: The short answer to that is remote sensing and the longer answer is that one of the biggest challenges in this space is providing scalable solutions. It’s pretty straightforward to estimate the carbon footprint of a single farmer’s field or even the carbon footprint of a whole farm but to make an impact, we need to be able to scale that across the landscape. We need to be able to do carbon accounting across the state, a country.

You know, ideally, we’d be doing this across the whole world. So, we use a lot of remote sensing information to enable us to scale what we’re doing and we make extensive use of machine learning models to get the most out of that remote sensing information.

[0:02:48.9] HC: What types of things are you trying to predict? You take the remote sensing images as input and what is the output you’re trying to achieve?

[0:02:55.5] MC: The output that we’re trying to achieve is the actual practices that farmers are doing on their fields. So, that’s what we’re trying to influence to make those practices more sustainable and the practices that we’re most concerned with at the moment are things like what kind of crops are they planting, what kind of crop rotation are they achieving year on year, whether or not they’re planting cover crops, what kind of tillage and what kind of tillage implements they’re using.

Those, plus nitrogen fertilizer are the main, sort of, sustainable practices that we’re concerned with and all of those, except for the nitrogen fertilizer are things that we can detect through remote sensing through what we call computer vision, which is interpretation of remote sensing data.

[0:03:33.7] HC: So, all of these are based on remote sensing images. What type of remote sensing? You use them as a satellite imagery, what types of satellite imagery?

[0:03:41.1] MC: Yes. So, our computer vision system works with satellite imagery from earth-orbiting satellites like Landsat and Sentinel. There are a lot of other forms of imagery available, like drone imagery but they tend not to be scalable, at least, at the present time and so, because we’re concerned with scalability, we work with satellite imagery almost exclusively.

[0:04:01.8] HC: How do you go about gathering the data, the images, and the labels, in order to train this and how do you get annotations?

[0:04:08.6] MC: Yeah, that is a really good question, this is a big deal for us. So, the satellite imagery itself is freely available, you know, it’s publicly available. Although, freely, may be the wrong word because storing and annotating it can be expensive, but in order to train a machine learning model, we also need labeled training data, and we obtain that from various sources. In the end, it always comes down to a farmer who has agreed to share their data with us.

And the data that we use most are the boundary of the field, which is an electronic file, and the associated management practices that the farmer has recorded on that field. So that means, data like what crop was planted, when the crop was planted and harvested, whether there was a cover crop, what kind of tillage implements they used. So, kind of the same things that I ran down before.

And as I said, you know, that’s coming sort of directly, or indirectly from a farmer, either something that they have recorded or something that has been recorded through, if they have the fancy kind of farm equipment that records that sort of thing as it’s happening.

[0:05:05.6] HC: Are you able to gather these type of data from farms around the world or is it fairly restricted into, you know, maybe farms that you have a partnership with and are able to obtain this data?

[0:05:15.5] MC: Yeah. So, we have – we’re working mostly in the US at the moment. In fact, we’re working entirely in the US at the moment, although, we have done a few pilot projects in other countries and the annotation data that we have is obtained from farmers in the US. Some of it comes directly from the farmers and some of it comes through third parties that we have a collaboration agreement with that might be a third party that has access to a lot of farmer data.

[0:05:37.8] HC: What kinds of challenges do you encounter in working with and training models based off of this agricultural data and the satellite imagery associated with it?

[0:05:47.1] MC: The biggest challenge is really just obtaining the label training data and getting enough of it. I mean, like everyone these days, farmers are sensitive to privacy and data protection issues and they may be reluctant to share their data. Sometimes, they’re concerned about data leakage and whether our systems are secure enough. Sometimes they’re worried that you know, we’re using this data to build a product that we sell and they want to be sure that they’re compensated fairly for providing data for that.

That is really the biggest challenge, it’s just getting enough data. There are ongoing challenges related to data quality. Very frequently, the data that we obtain from farmers needs to be sort of cleaned up or curated and validated before it can be used to train a model, and that curation process is time-consuming and kind of difficult to automate. There are occasionally issues with the data quality of the satellite data.

But because it’s publicly available data and a lot of people are relying on it, I feel like that side of things is more challenging. Every once in a while, there are challenges related to something really mundane, like there was an entire month of, let’s say, cloudy days and so, we don’t get much satellite imagery that’s helpful during that time period but again, that’s sort of the exception and not the rule.

[0:06:55.4] HC: How do you go about validating these models? You said you’ve got some training data based on farms in the US. How do you make sure that your models work on farms for other areas?

[0:07:06.1] MC: Really good question. So, first of all, because we have done all of our training in the US, I would not claim today that our models are completely valid in other areas. As we think about geographical expansion into other parts of the world, one of the things that we think about is obtaining additional validation data against which to test, and potentially, against which to retrain our models because we know that there’s inherent biases in the ground truth data that we collect.

You know, no matter how well we curate our training datasets, there’s always going to be bias in a different year or a different crop, or a different country. The imagery is going to look a little bit different, so it’s a constant challenge trying to make our models as generalizable as possible but in terms of how we validate our models, you know, we use a lot of classical sort of cross-validation techniques.

When possible, we like to have a separate holdout test that we don’t do any training on. I’ll be honest, due to the difficulties of getting enough training data, it’s not always possible for us to do that but that’s always our goal and then we also work directly with customers to validate our models on their data.

So, a customer or a collaborator, I should say, may have a collection of fields that they own if they’re the farmer or a collection of fields that they have access to if the customer is some sort of commercial entity and we can work with them to test and validate our models on their fields while keeping as much information in their hands as possible and that’s to avoid any inappropriate data sharing.

[0:08:30.0] HC: When you are limited on the amount of label data you have, are there any specific strategies you have for cross-validation to make sure that your models generalize to the types of variations that you might want them to?

[0:08:40.9] MC: I don’t know that we have like specific cross-validation techniques so much as we have sort of specific requirements when we are looking for sets of data. So, for example, one of the things that we’re very concerned about in labeled training data is that it be geographic, as geographically dispersed as possible. So, we don’t want all of our training data to be concentrated in let’s say, one state or a couple of states.

We’d like it to be widely spread throughout the agricultural portion of the US. Again, we would – we like the training data rather than all being concentrated in one year or a few years to span quite a number of years, as many as possible because every year looks different. You know, one year, there may be a lot of snow on the ground and another year, may be very dry and those are things that can creep in too and create unintended biases in your model if you’re not really careful about that.

[0:09:28.5] HC: In developing these models, how do you collaborate with domain experts? You know, it might be somebody who understands better, or how things can vary from year to year, from crop to crop, or different aspects of the crop that are important in training models. How do they play into the model development?

[0:09:45.4] MC: Yeah, we work with a lot of agronomists who are domain experts in how crops grow, so we have agronomists on staff and we work with external agronomists at other companies. We would like to be working with the USDA, although, they are very busy people and that can be sort of challenging as well but those are really important collaborations, both to understand and validate, you know, our approaches and the results that we’re getting to sort of do a reality check on whether the results that we’re getting seem reasonable.

And also, those collaborations can really help us to stay grounded in, I guess, that’s a pun, grounded in reality because it’s really important for us to keep in front of our minds the day-to-day concerns of farmers and how we can best support them.

[0:10:25.2] HC: How does the seasonal nature of agriculture affect your model development or the things you do in certain times of the year based on the seasonal cycle in the US or does it really not make a difference?

[0:10:36.4] MC: In terms of development, it doesn’t make very much difference. You know, it’s true that every year there’s a sort of a new set of data regarding what farmers did this year but the development that we’re doing relies on kind of a rich history of passed data. So, we’re working all year round to develop new products. However, the seasonal nature of agriculture does affect the timing of some of the reports and outputs that we produce.

So, for example, every year, we run a cash crop assessment of the US estimating the cash crop that was growing on every satellite pixel, basically, in the agricultural portion of the country, and it makes sense to run that assessment, that report at a time of year when crops are well-established for the year. So, that report will be coming out in the next few months. Similarly, we run an assessment each year on how many and which acres were planted to cover crops.

And it makes sense to do that in the off-season, which is when cover crops are growing. So, many of the outputs that we create have kind of a seasonality to them but in terms of development, we’re developing all year round, all the time.

[0:11:35.1] HC: Is there any advice you could offer to other leaders of AI-powered startups?

[0:11:38.7] MC: Oh boy, yeah. One thing I think I already alluded to and that is the importance of staying focused on the customers. I feel like when you’re working in a really cutting-edge area, it’s tempting to sort of get caught up in the buzz of the new technology and lose sight of what the customer needs, and so we need to keep that in mind. We need to not always be following the latest, greatest advance.

We need to be going in a direction that’s going to really provide value and I think that’s really important, and I guess the other piece of advice that I think is really important is just to hire the very best people that you can find. This is an area where new technology is crucial to the products we’re producing. So, you want to make sure that your technical people are as experienced and knowledgeable and just you know, absolutely top-of-the-line people. There’s just no substitute for that.

[0:12:26.3] HC: Are there any specific requirements for hiring? Is it really about their background and machine learning or would it be more about their breadth to different applications and perhaps, even specific experience in agriculture that’s more important to you?

[0:12:41.0] MC: It’s almost more like, there are some important personality traits. So, definitely, you know, being technically top-notch in the field of AI and machine learning is important and beyond that, all of the work that we do, and really, I think most of the important work that’s being done in this area is being done by teams, not by individuals and so, we really look for people that work well with others that have a team orientation and you know, are willing to share their experience and their knowledge pretty freely with other people, people who collaborate well.

It helps if people are good communicators and can sort of, you know, this is – as you know, this is a really technological field. There’s a lot of buzz words and if people can communicate outside of their own area of expertise, that’s very helpful as well, and then really people who have a hunger for learning and want to learn new things and are here to explore. That’s almost more important than any specific experience that they might have in agriculture.

For example, if they’re eager to learn about agriculture and eager to understand the farmer’s point of view, that’s going to go a really long way in our field.

[0:13:42.6] HC: This sounds like pretty valuable characteristics.

[0:13:44.7] MC: I mean, it’s hard to argue with things like that in almost any area. So, maybe what I’m saying is not specific to AI but these are really, really important qualities and I think that if I were giving advice to somebody new in the field, I would say cultivating those qualities is one of the best things you can do.

[0:14:00.4] HC: And finally, where do you see the impact of CIBO in three to five years?

[0:14:04.1] MC: Three to five years is a – I would say, it feels like a really long time. When I look back three to five years, a lot has changed and I think a lot is going to continue to change. I don’t think that we’re going to be selling the same products in a few years that we’re selling now but we are focused on sustainability solutions on things that are globally applicable on solutions that are powered by real data and motivated by real farmer concerns. That will stay the same.

So, today, the markets are really emphasizing soil carbon, carbon credits, carbon footprints. I think in the next few years, there’s going to be an expanded emphasis on sustainable water use, sustainable fertilizer application, improving the biodiversity of our landscape, and so on and so forth. So, that’s what we’re looking for and looking forward to in the next few years and we’re working now to build the tools and solutions for that space when they become needed.

[0:14:56.6] HC: Well, I look forward to following you and see how CIBO does. This has been great. Marie, I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[0:15:07.7] MC: Yes, CIBOTechnologies.com is where we are and we welcome visitors to our site, we have a lot of materials related to sustainability and related to machine learning, and computer vision. So, stop by and check us out.

[0:15:21.4] HC: Perfect. Thanks for joining me today.

[0:15:23.8] MC: Thank you. It was my pleasure.

[0:15:25.1] HC: All right, everyone. Thanks for listening. I’m Heather Couture and I hope you join me again next time for Impact AI.

[END OF INTERVIEW]

[0:15:34.9] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend, and if you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]