With farmers sometimes waiting weeks for lab results to make critical decisions, Benjamin De Leener, Co-Founder and Chief Science Officer of ChrysaLabs, sought to transform the future of soil health. ChrysaLabs has developed a groundbreaking handheld, AI-powered probe that delivers fast field-ready insights into soil properties like pH, nutrients, and organic matter.

In this episode of Impact AI, Benjamin dives into the journey of creating this innovative tool, the challenges of working with complex agricultural data, and the role of machine learning in empowering farmers to make sustainable, data-driven decisions. Tune in to discover how this technology is not only boosting farming efficiency but also contributing to a healthier ecosystem and the fight against climate change!


Key Points:
  • Benjamin’s biomedical engineering background and how it led him to start ChrysaLabs.
  • How ChrysaLabs’ portable probe provides real-time soil analysis.
  • The role of machine learning in converting spectroscopy data into actionable soil insights.
  • Challenges in acquiring diverse, high-quality soil data for model training.
  • Addressing variability in soil and lab measurements to ensure model accuracy.
  • What goes into ChrysaLabs’ validation techniques to maintain robust, reliable AI models.
  • Considerations for overcoming seasonal constraints in agricultural data collection.
  • Technological advancements that have enabled portable, cost-effective sensors.
  • Advice for AI-powered startups: balance data volume with variability management.
  • Collaborative efforts between agronomists and machine learning engineers at ChrysaLabs.
  • ChrysaLabs’ vision for improving soil health and combating climate change.

Quotes:

“There’s a translation between the light information that we receive from the spectrometer and the information that is actionable for the farmers and agronomists. The machine learning models are between the hardware, the application, and what the farmers can do.” — Benjamin De Leener

“The main challenge that the agronomists and the farmers have is the data about what’s in the soil. So, that’s what we provide.” — Benjamin De Leener

“The more data you accumulate, the bigger the variability that you need to take into account. It’s not always better to think, ‘The more data I have, the better’ because sometimes, the less data, the more focused the models are.” — Benjamin De Leener

“We want to combat climate change – [We believe] that the soil can sequester a lot of carbon through agriculture, and we want to provide a way to measure that so that, when we choose one agronomical practice over another, we understand what we’re doing.” — Benjamin De Leener


Links:

ChrysaLabs
ChrysaLabs InsightLabs
Benjamin De Leener on LinkedIn
Benjamin De Leener on Google Scholar
Benjamin De Leener on X


Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.


Transcript:

[INTRODUCTION]

[00:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine-learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people in planetary health. You can sign up at pixelscientia.com/newsletter.

[INTERVIEW]

[0:00:33.7] HC: Today, I’m joined by guest Benjamin De Leener, co-founder and chief science officer of ChrysaLabs, to talk about soil. Benjamin, welcome to the show.

[0:00:42.9] BDL: Well, thank you for welcoming me.

[0:00:44.9] HC: Benjamin, could you share a bit about your background and how that led you to create ChrysaLabs?

[0:00:49.0] BDL: Absolutely. I’m an engineer by training, I specialize in computer science and biomechanical engineering in Belgium, and I moved in Montreal in now a few years ago. I’ve always been a researcher, mostly in biomechanical engineering, doing neuroimaging but I was working with one guy, Gabriel, sharing an office with him and basically both doing PhDs in biomechanical engineering, doing computer stuff and wanting to do more, and the more we will have.

And so, after a few months and years, we started working on other things than just doing our PhDs. So, for example, we helped creating the Fab Lab at Polytechnic Montreal, the university and I mean. We also created a student lab doing our studies, everything was basically trying to do something else that just the PhD and that led to urban agriculture. We started the project creating a robot first urban agriculture in Montreal.

We even convinced the library at Polytechnic Montreal to install a garden in there, we couldn’t agree if it was a good idea but the robot walked pretty good but most interestingly was the sensor that we developed for the robot, which led to the creation of ChrysaLabs after.

[0:02:08.9] HC: And so, what does ChrysaLabs do and why is this important for farming?

[0:02:12.9] BDL: Well, ChrysaLabs, we develop technology for soil analysis. So, that includes portable probe that provides multiple soil characteristics directly in the field. So, that’s basically something that we hold in your hands that you put a hole in soil, then you put the probe and then you press a button, and it gives you information about the soil properties like PH and nutrients level, organic matter, et cetera.

So, the same probe can be used to measure nutrients levels in the field or carbon, for example, and it really helps farmers to understand what’s happening with the soil when they want to do fertilization, when they need to do diagnostic, when they need to put fertilizers. Usually, the farmers, they do soil analysis with the laboratories but it takes a long time. It takes a long time because we need to take the soil sample, pack it in a bag, send it to the lab, wait for the reports.

It’s really great, it works well but you need to wait a long time and relatively costly. So, we try to avoid that cycle of waiting for the results and we provide the results directly in the field. So, that’s really important for farming because you can take decisions directly, that could be about fertilization, that could be about changing some practices based on health, or just to give you the information about how do you sequester carbon, for example, in the soil.

[0:03:37.9] HC: And what role does machine learning play in this technology?

[0:03:40.7] BDL: Well, we develop the probe, which is a piece of hardware, that’s a sensor that has existed for a long, long time, that spectroscopy. So, analyzing the lights that – the reflection of light, to build the models that estimate the soil properties, we need to have data and we need to train machine learning models from the signal that comes out of the probe using spectroscopy again, to provide an estimate that this is the sort of property.

So, there is a translation between the light information that we receive from the spectrometer to the information that is actionable for the farmers, for the agronomists. So, the machine learning models are between the hardware and the application and what the farmers can do. So, it’s basically using a lot of data, using a little data from the probe and also from the laboratories to build these models.

[0:04:36.3] HC: So, given spectroscopy data about soil, your model can then predict that the current status of the soil, and then make recommendations based on it, something like that?

[0:04:46.1] BDL: So, we don’t go as far as recommendation because there is a lot of implication about soil recommendation. For doing an agronomy recommendation based on the information that you have in the soil, you need to have more than just the soil information. You need to have context, you need to know what do you want to – what seeds do you want to put in the soil, the region, the weather, you need to take a lot of information into account.

But the main challenge that the agronomists and the farmers have is the data about what’s in the soil. So, that’s what we provide. We help the agronomists and the farmers to make a recommendation but we do not make the recommendation for them, and so that’s what’s really important is to have the right information to take a decision, that’s what we provide.

[0:05:30.8] HC: How do you go about gathering the data and perhaps annotating it in order to train these models?

[0:05:36.8] BDL: That’s a good question because data in agriculture is really costly and really difficult to acquire. So, we work with a lot of partners, we work directly with the farmers that want to test the probes. Usually, when someone try to try your system or technology, we have a phase where we gather data to make sure that we understand what’s in the soil, what’s happening for the specific region, for example, if we haven’t seen it in the past.

So, we gather data with the partners, with the farmers directly, and of course, we do a lot of our own acquisition campaigns. So, we go into the fields, region by region, and then we accumulate a lot of these data, and so this data is basically taking the samples with the probes. So, we put the probe in the field but we also, at the same location, take some source sample, put that in a bag, and send that to laboratories.

And I say laboratories, you ask because we send that to multiple laboratories to understand what is the variability that we see within the soil but also between laboratories so that we can train the model the best way possible. So, annotating data, there is not a lot of annotation. It is mostly making the link between the data that we acquire in the field and the data that we collect with the laboratories, making sure that there is a clear correspondence between the two because it takes a lot of manpower and labor to make these acquisitions.

[0:06:56.3] HC: So, the laboratory results provide the labels that your models are going to try and predict?

[0:07:00.9] BDL: Exactly, yes.

[0:07:02.0] HC: What kinds of challenges do you encounter in working with and training models based off of the soil data?

[0:07:07.8] BDL: Well, I just mentioned, it’s the viability. So, when we are going to the field, I mentioned that you have seen that, it’s – you go into the field sometimes, it’s more loamy, sometimes it’s more sand, and so every field is different. The soil texture is different, the way you gather the data is different but the viability about the soil texture, and we look at spectroscopy, which basically color, of course, the soil texture will influence the spectroscopy models that we will develop.

Spectroscopy is an indirect measure of the soil properties and what’s in the soil, like the same way that you look at carbons. So, we have to take into account a lot of the conflicting factors that could affect the models. So, the viability within the soil is really important, that’s one of the main challenges that we have but also variability for the measurements that we take with the laboratories.

When you send a sample to multiple laboratories, you don’t know who has had the same results, for many reasons. It does not mean that it’s not accurate but we don’t know exactly what is the true value of a source sample. So, you’d send that to multiple laboratories and you have sometimes slightly different results. You have to take that into account when developing the models because that’s in their own variability.

It’s not necessarily nice, it’s just variability within the grand truth data. So, your spectroscopy sensor needs to take that into account because there is variability in the sensor as well. So, all in that, there is a lot of variability in general, in manipulating soil, manipulating spectroscopy measurements, manipulating laboratory measurements. On top of that, you add the external conditions like the weather.

You’re going to a field, it just rained, it’s more humid, it will affect the measurements, I think the way you acquire the data, I said, and that impacts the models as well. So, you need to model that, you need to take that into account to make sure that what you provide as a measurement, understand what’s happening, and then it’s accurate and reproducible.

[0:09:08.3] HC: How do you validate your models? I’m guessing a lot of it is based on making sure they’re robust to the different variations that you just talked about.

[0:09:15.2] BDL: Yeah, yeah, exactly. So, making sure that we understand what’s happening when we take one measurement, that’s really, really important because of all the variability that I mentioned. So, we compare the results that we got to multiple laboratories quite often. We try to build models that are specific to certain conditions. For example, we had these ideas a few years ago, at the beginning, of building one giant model that would solve all the problems.

And that’s not true, that’s not probably not possible, except if we have millions and millions of data points, which in fact, we could share it. It’s really costly, very difficult to accumulate this data. So, we need to find ways to reduce the variability in the models and compensate for that variability. So, validating the models, it’s always a continuous process, making sure that we don’t have drift in the models.

That we have seen the conditions we go, and so it’s project by project basically. Sometimes client by client, sometimes even fields by fields, that is important to validate within a specific field. If we had never seen for example the texture of the field, you know, in database. So, it is a continuous project that we need to always look at, validation of the models to make sure that we provide a measure that is relevant for the farmer.

[0:10:29.5] HC: And then making sure that your models are robust to these different variations. Does a lot of that come down to just making sure your dataset is diverse for training or are there other techniques in modeling that can help you accommodate it?

[0:10:43.1] BDL: Well, I would say both. We need to make sure that the diversity of the database is relatively large but that’s kind of a dream that you’re chasing and you will never get a hold of it, you know? Its variability is so enormous that you accept if you have a very, very, very large datasets but even then, it expires, you know? So, you have to take into account this variability in the modeling process, making sure that you understand what’s happening in your model.

So, these black box models that we have seen in the past, it’s not really working for agriculture because we need to make sure that we understand what is the effect of, for example, temperature on the model or soil texture on the model or the way you handle the probe basically on the model itself. So, all of these, you need to find ways to compensate, and so there are techniques that you can use for that but it is mostly by looking at the data and making sure that you understand what’s happening.

[0:11:40.8] HC: How does the seasonal nature of agriculture affect machine learning development? Is there an aspect of that that comes into play?

[0:11:47.8] BDL: Multiple aspects. I would say the first aspect that we think about when we look at the seasonal nature of agriculture is the weather, the temperature. We are developing these problems in Montreal and in Quebec, and in Quebec in Canada, there is snow in the winter. So, the soil gets frozen, and then when you look at how nutrients interact with the soil in a frozen soil, that’s not the same thing as in the summer, so you have to take that into account.

Can you even measure something in the winter or do you need to wait? In other countries, that’s easier for that aspects but you need to take at least temperature into account when you have an instrument that is an electronic instruments that is measuring something in various temperatures. You need to calibrate your sensor for that, for these, and the way and just to off that temperature that you will see in any effect.

But I would say that is not the most important effect of the seasons. For example, if you go, again, in Quebec and try to make experiments to develop machine learning models, you have to gather data. You have to go into the field. Sometimes, the fields are frozen, there is snow. Sometimes, there are some things in the field, plants, and we cannot go into the field. So, the windows to take the measurements, to take the soil sample, it’s limited because of the season nature.

And it depends on the crops that you planted, so you need to take that into account. For example, if you develop a new sensor or a new technology, you cannot just sample all the time. That accounts for soil information but for all sort of agriculture. It’s all seasonal so you need to take the season into account and if you miss it, well, maybe it’s going to go for next year, and so you need to take that into account in your development because you can lose a lot of time just missing a season for development.

[0:13:34.1] HC: Are there any specific technological advancements that made it possible to do this now when it wouldn’t have been feasible to develop a solution like this even a few years ago?

[0:13:43.4] BDL: Yeah, sure. That’s true that I am thinking about, of course, machine learning is easy in the prototype, democratization of machine learning is one thing but two, that are more specific to our business because we are developing a new sensor that is portable that’s going into the field are connectivity. Definitely, it’s super important to have a good connectivity everywhere and if you don’t have it, the way to handle this missing connectivity in the fields, and the second one is the advances that has been done in miniaturizing the sensors.

We have a lot of sensors that are really advanced that are cheaper and cheaper but really miniaturized because when you go into the field, you don’t want to have a huge amount of things that you get into the field and it’s really complicated to walk. We really need something that is small, that is portable. The advances in miniaturization of the sensor specific for the spectroscope, the spectroscopy sensors, it really helped us developing the probe around it.

So, in the past 15 years, 20 years, spectroscopy has been there for a long time, and other sensor have been there for a long time but making sure that there are cheap, small, portable, they don’t suck up a lot of energy, that’s really important to make, to make sensors that can work directly into the field.

[0:15:07.2] HC: Is there any advice you can offer to other leaders of AI-powered startups?

[0:15:11.1] BDL: There is one that is quite obvious but everybody knows it and forget about it after a while, it’s working with data is never simple. We tend to think that the more data that we acquire, the easier it will get. We have these conceptions of my model is working and the subset of data and then if I increase the amount of data, it will work even better but that’s not necessarily true.

In a lot of cases, we’ve seen in agriculture and in other domains, it’s not true because when you increase the amount of data you usually increase the variability that you have in the data. Look around yourself and then take a picture and then you will see a bunch of objects into the picture but if you take a lot of picture around you, there will be a lot more objects to take into account in your models.

So, that’s the same thing with all domains of machine learning. The more data you accumulate, the bigger the variability that you need to take into account. So, it’s not always better to think the more data I have, the better because sometimes, the less data, the more focused the models are, and so they’re working better. So, working case by case and making sure that you understand what’s happening with the data when you increase a little bit trying to add context, that’s the main advice I would give to anyone working with machine learning.

Because it is really tricky to gather a lot of data but if you don’t understand the amount of variability and noise that you add every time you add a new point, then it’s in trouble for later.

[0:16:45.2] HC: And I guess part of that is truly understanding your data and –

[0:16:49.4] BDL: Absolutely.

[0:16:50.3] HC: When that comes from – when it’s is soil data, in this case, that means you need to understand the soil, that means you need the agronomist involved not just machine learning engineers.

[0:16:58.3] BDL: Agronomists are really, really important, they are the scientists of the soil. Soil is a science and that’s really important to understand what’s happening. We don’t understand everything that’s happening in the soil, by the way, it’s like the human body, it’s like the brain. We don’t understand exactly everything, all the macro case that happens, the data in the soil, and what happens when we add something in the soil.

So, just trying to model the soil is really complicated, the data mix of the soil. Modeling the dynamic of the soil for every region in the world, that’s even worse. So, you know, you need to understand the data, look at it, and try to conceptualize what’s happening when you try to measure it.

[0:17:33.4] HC: And finally, where do you see the impact of ChrysaLabs in three to five years?

[0:17:37.1] BDL: We believe that ChrysaLabs will have an impact in the way agronomy is done and farming is done because we provide information about what’s happening in the soil. We try to push more information to the ends that of the people that could understand and could model soil health typically because when you try to see the soil and then have a good yield, you basically want your soil to be really healthy years after years.

So, you need to manage it quite well and we’ve got data, it’s really difficult. We really believe that we can have an impact in farming but through agronomy, through the way that farmers can manage the soil health through fertilization but optimization and fertilization. So, promoting a healthier ecosystem and ecosystem is really important here. It’s just not soil, it’s not just microorganisms, it’s the whole ecosystem.

So, we try to unlock this potential of the lens of agriculture to be more efficient at growing plants and green foods. We want to combat climate change and we think that’s really important because we also believe that the soil can sequester a lot of carbon through agriculture and we want to provide a way to measure that so that when we choose one agronomical practice over another, we understand what we’re doing.

Without removing any yield because it’s really important to continue to grow food but by sequestering carbon, it’s increased soil health as well. So, it’s a whole cycle and a whole ecosystem that we’re trying to provide the data on so that people can take better decisions.

[0:19:06.8] HC: This has been great, Benjamin. I appreciate your insights today, I think this will be valuable to many listeners. Where can people find out more about you online?

[0:19:14.1] BDL: Definitely through our website, we try to provide a lot of information through everybody. We have the ChrysaLabs insights platform, where we provide a lot of information. So, I will encourage you and everyone to go and take a read there. Thank you for inviting me.

[0:19:27.7] HC: Yep, perfect. I will link to that in the show notes. Thanks for joining me today.

[0:19:31.6] BDL: Thank you as well.

[0:19:32.6] HC: All right, everyone, thanks for listening. I’m Heather Couture and I hope you join me again next time for Impact AI.

[END OF INTERVIEW]

[0:19:42.7] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend, and if you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]