Biodiversity is not just an ecological concern. As you’ll learn in this episode, it has tangible economic implications too. Today on Impact AI, I'm joined by Dr. Noelia Jimenez Martinez, Head of Insights and Machine Learning at NatureMetrics, to talk about biodiversity monitoring. NatureMetrics is a global nature intelligence technology company providing end-to-end nature monitoring and impact reporting. Powered by eDNA, their Nature Intelligence Platform allows any company to manage its impacts and dependencies on biodiversity at scale, translating the complexities of nature into simple insights that help to inform the best decisions for both the planet and business. Tuning in, you’ll learn about the importance of NatureMetrics’ work, the role that machine learning plays in their technology, and some of the challenges that come with working with sometimes unpredictable data from nature. In my conversation with Noelia, we also touched on why biodiversity is an increasingly urgent imperative for businesses of all kinds, how NatureMetrics is democratizing biodiversity monitoring, and much more!
Key Points:
- Insight into Noelia's background in astrophysics and how it led her to NatureMetrics.
- What NatureMetrics does, what eDNA is, and why it’s so important for sustainability.
- The major role that machine learning plays in NatureMetrics' technology.
- Specific examples of the types of models that NatureMetrics trains.
- How Jurassic Park helps us understand what eDNA data looks like.
- Different ways that this data is gathered depending on the relevant project.
- Unique challenges of sampling for eDNA and training models based on those datasets.
- How NatureMetrics measures the impact of its technology and makes biodiversity monitoring more accessible and achievable.
- Noelia’s urgent and common sense advice for other leaders of AI-powered startups.
- What the future holds for NatureMetrics and how their impact will continue to grow.
Quotes:
“I couldn't focus too much on solving galaxy formation with the amount of bad news I was seeing in the climate space and biodiversity collapse. I made a transition – [to] looking for jobs to apply [my astrophysics skills to] related problems in climate and biodiversity.” — Noelia Jiménez Martínez
“Nature does not seem to behave [as well] as we would want. It might be that you have exactly the same covariates and your model is predicting species, and then you go, and it's not there.” — Noelia Jiménez Martínez
“[Most companies] will have to report on their sustainability strategies in the world to keep on functioning. In that context, what we can do here is make biodiversity monitoring achievable and democratically easy to access.” — Noelia Jiménez Martínez
“The success of [an AI startup is] – tied up to the diverse, strong teams you build.” — Noelia Jiménez Martínez
Links:
NatureMetrics
Dr. Noelia Jiménez Martínez on LinkedIn
LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.
[INTRODUCTION]
[0:00:02] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven machine learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people and planetary health. You can sign up at pixelscientia.com/newsletter.
[INTERVIEW]
[0:00:33] HC: Today, I’m joined by guest, Noelia Jimenez Martinez, Head of Insights and Machine Learning at NatureMetrics, to talk about biodiversity monitoring. Noelia, welcome to the show.
[0:00:43] NJM: Thank you very much. It’s a pleasure to be with you and talking about these very interesting subjects.
[0:00:48] HC: Noelia, could you share a bit about your background and how that led you to NatureMetrics?
[0:00:52] NJM: Yes. Thank you. I was born in Argentina in a piece of Las Yungas Forest that’s disappearing super quickly. I studied astrophysics because I was super curious about the sky and the stars. The landscape permits you to see a huge amount of stars during the night, the whole Milky Way, actually. My map of nature started with astrophysics. I did a PhD in computational astrophysics, studying chemical evolution of galaxies, and I stayed in academia for a long time.
I did three postdocs in different universities in Europe. At some point, for me, it was crucial to have a better project in mind. I couldn’t focus too much on solving galaxy formation with the amount of bad news I was seeing on the climate space and biodiversity collapse. Basically, I made a transition on purpose, looking for jobs to apply the skills I built all these years into, yeah, related problems to climate and biodiversity. Yeah, I was very successful to transition into, because obviously, it was the time to do it as well.
I worked in London, the data science consultant, and then I [inaudible 0:02:04] for a company. I even pitched a company doing biodiversity and AI. I was told it was too early. I joined Microsoft. I worked as a sustainability global lead. I specialized even more in sustainability and AI. Yeah, all that built a really nice case for me to be in NatureMetrics now.
[0:02:24] HC: What does NatureMetrics do and why is it so important for sustainability in the world?
[0:02:28] NJM: Yeah. NatureMetrics is a company that started specializing in eDNA analysis, so this is the genetic material that organisms shed into the environment through, for example, skin, feces, or urine by analyzing that from soil, water, and air. What you create is a very unique data set of species on the ground. This is a ground truth method to understand ecosystems. That’s the starting point of NatureMetrics.
What we are doing now is integrating all these data with all the data layers, like remote sensing, satellite data, BO acoustics, camera traps, etc. We are standing on a huge amount of layers that can help us understand how an ecosystem actually lives. The company is focused on moving from just eDNA to giving a technology that it’s going to be able to, and thus already protect biodiversity and helps to enhance biodiversity by giving very specific quantitative ways to monitor the ecosystem and biodiversity.
[0:03:33] HC: What role does machine learning play in this technology?
[0:03:36] NJM: A huge one. To start with, the biodiversity mapping, the biodiversity monitoring will obviously collect integrated data from different sources and for that the intelligence that you need to bring in order to actually have an insight will obviously rely on recognition of patterns, understanding the hierarchical model in the ecology. All those are machine learning algorithms, but on top of that, what we are doing now is creating a transferable AI model that can pick up the complexities of ecosystems. For this, we use deep neural networks. What we want is a training model that will pick up the correlations between species and other environmental variables, like temperature, precipitation, land cover, you name it.
This continuous map of species and distributions are what we provide and we have as a baseline to give companies an understanding on how their operations affect the biodiversity of the landscapes they operate. It’s related to machine learning in terms of the modeling we do, obviously it’s a conglomerate of different algorithms, but also, we are innovating with the use of, for example, a large language model for internal tools. We might eventually launch an external tool on biodiversity and large language models. Yeah, I will say it plays a huge role in what we do.
[0:05:05] HC: Can you give you some specific examples of the types of models you train, like what the input looks like, what the output looks like, perhaps?
[0:05:12] NJM: Yeah. So, for example, we will have, I had a conversation this morning, for example, with a client that needs to predict invasive species in an area where they operate. We will have inputs on eDNA on the ground, so different species that were mapped, either by soil, or water, or air. We will integrate that with open-source like data GBIF, which is the global biodiversity data set. That’s open-source.
If we have, let’s say, the best-case scenario we don’t, but if we had camera traps or BO acoustics, we would have that too. So, with all these different layers of data, after a lot of cleaning and normalizing, standardization, etc. we have a picture of which species are present. For doing that, we run algorithms that are quite species iteration models, and all those that are called joint distributions of species iteration models. These algorithms are known by the ecological community, obviously. But what is interesting is that we can now add complexity, because of these different layers, but also because of the different frameworks to work in machine learning and accelerate the algorithm’s precision, train them at different layers.
The new field is actually super, super interesting because we get more and more computational power, yeah, a large data set. The progress is immensely the output of these algorithms is, for example, a map that we provide where we have recommendations on where these species are present. This probabilistic model obviously is not the absolute truth, but it’s a probabilistic model of where the species are present. Why is this useful? Because ecosystems are very complex or non-linear relations of predators and traffic chains.
The fact that you have a distribution of where species might be, it helps the conservationist, the managers on the land, to specify where the operations are not going to be taking place, for example. This is important because you can actually stop the declining of species, but also imagine if you have, in this case, the very invasive species that will alter a lot of the ecosystem, in this case, is a species that will probably destroy crops. The fact that you identify where they are, it helps you to have interventions on time. In that way, you can stop the declining of your crops, and also you can predict, for example, what variables made the species thrive. Yeah. So, that’s an example of, for example, input output of one of the models we are working on right now.
[0:07:54] HC: You mentioned that the input for all of these is environmental DNA data. What does eDNA data look like?
[0:08:03] NJM: Have you seen Jurassic Park?
[0:08:06] HC: Yes.
[0:08:07] NJM: It’s an old movie, but I mean, not for me, but the DNA is a sequence of letters. In the end, what we got is a piece of all this massive chain of genes that people at the lab using a method called metabarcoding, can match to already existing library of species, and tell us which species are present. This is something called OTUs. In the end, they produce something called OTUs, which is Organizational Taxonomical Unit. It looks like when you open a data with the raw data for the curious ones, you have an ID, an OTU, which is the A-B-B-C-C-T-D. No, actually, it’s the four letters of the DNA, so not A-B-C, but the A-G-T, [inaudible 0:08:50]? Do you remember these ones?
[0:08:52] HC: Yeah. The four base pairs. Yeah.
[0:08:54] NJM: Yes. The base pairs. So, you have a combination of that, and then you have most of the times, you have the time where it was picked up from, so yeah. The time of recollection, and also you have location-based data. That’s the minimum we need to integrate it with all the sources.
[0:09:10] HC: These are partial strands of DNA. They’re not full sequencing.
[0:09:14] NJM: Yes, exactly. No, no, no, we can –
[0:09:15] HC: Bits and pieces.
[0:09:16] NJM: Bits and pieces that they are called base markers. They use that to understand which species is present. This is a very specific expertise from the informaticians. Yes.
[0:09:28] HC: How do you gather this data?
[0:09:30] NJM: Each of these projects we are working on will have a strategy for sampling. We understand, for example, the different projects, we need different things. Also, depending on which eDNA you’re collecting. If you have, for example, soil data, that’s easy to reach. It’s easier to reach, and so you can have any person can just go. With the kit we send them, collect the data, and come back and send it to the lab. That’s good enough for us to do the rest of the modeling. Some other samplings are a little bit more complex. If they’re in places, imagine a mountain or places that it’s good to reach. Some people have to use drones. They send kits to collect eDNA with drones.
In general, it’s an easy process where we send them kits, tell them how to do it. They collect either the water of soil or even the insects sometimes. We put the insects in the traps and they send them back to the lab and we can process that.
[0:10:30] HC: What kinds of challenges do you encounter in working with this type of environmental DNA?
[0:10:34] NJM: You should talk to the biodiversity. Yeah. You should talk to the bioinformanticians. They will tell you a huge amount of things. I don’t want to oversimplify eDNA. It’s obviously complex. We are setting the standards on how to do that properly. In principle, one of the most tricky things is that, if you don’t have access to sites, as I was saying, it’s difficult to get to the places where you strategically should be sampling to have a well-known bias data set.
Yeah, that’s one of the main things. The ground is – it could be a tricky place to sample tight level sampling. It’s going to depend on what do you have. So, when you go there and see where – if you can actually reach to that piece of the forest or it’s probably in the middle, there is a, everywhere you didn’t realize it was there or something like that. It could happen and that’s where we have to be flexible as well in understanding that some plants are biased by this, so we have to correct. There are many more things very specific on the eDNA. For example, abundance. It’s something we are investigating because it depends actually how do you treat this. There’s something called presence-absence and many other statistical problems on the eDNA.
[0:11:55] HC: What about when you go to train models based on eDNA? You mentioned bias as one of the challenges there. Are there others? How do you handle them for these unique challenges?
[0:12:05] NJM: That’s actually super important for us. Obviously, like any machine learning practitioner, we test on unseen data and see how well the model performed, but actually the best way to do it would be to ground truth again. So, having a second sample on the place you read sample at least once to see how is the model performing, because in other machine learning problems, we just deploy the model and gather new data and see the data drift or whatever you have to correct. In this case, if we sample a place, we produce a map only at say priority of areas to conserve, to protect, because of very specific species, we would go again and sample a year later, because that’s usually how biodiversity conservation works.
Also, you have to consider seasons. Yes, so in this case, what we want to do and we have some before is gather data again from the same place to actually test the performance of the model and be super sure, also because these are beings, they are living beings, so we want to be responsible as well that we are not underestimating any of the variables.
[0:13:13] HC: In this case, how would you identify that there is a bias and what are some things you might do to mitigate it?
[0:13:19] NJM: Oh, there are many things on ecosystem that would create a bias. For example, if you can have a species resolution model that predicts certain species that you, by definition, you can see there, because there are some correlations on the environment, like let’s say, temperature and vegetation, soil moisture. If you move two meters away from that site, which you predicted for some geological historical reason that has nothing to do with your model, those species might not be present, so you cannot really extrapolate.
This is very well known by the whole community and as a person that comes from a different field. It’s crazy to see that the machine learning models, well, I mean, this is the question, the machine learning models get so good that they will pick up these inferences, even if we don’t give them direct data on every ecological and geological event that happened.
Yeah. To the question of how do we know and how we test. We have many ways to correct from biases, good things are to have all the data sources, like the GB, for example, or wherever the people that work with us have collected on site. There’s also literature of different things that we’re seeing on the site historically. It’s not a complete disaster, but it’s really hard still because this is new. We have a step-by-step something in creating a general map of biodiversity that is going to be at some point the base of how we operate everywhere.
[0:14:53] HC: How broadly can you apply these models? Is it something that you could train in one location and works in a different geographic area or do you need to have a broad set of training data to cover all the regions that you might want to apply it to?
[0:15:07] NJM: Ideally, yes, we would want to say that, but we are conscious that at this stage of development, you can’t predict on unseen spaces, so this is why it’s a very different machine learning problem, because of what I said before. So, nature does not seem to behave, like so well, as we would want. It might be that you have exactly the same covariates and your model is predicting species and then you go and it’s not there. This is because maybe species level is too much for this level of training and learning, but we can go to species richness or to family, to group. If you think on the hierarchy that you can have from eDNA, it can be generalized. I think the challenge is to keep on giving the system more and more data to see if these interconnections are there at some point.
[0:16:01] HC: Are there any specific technological advancements that made it possible to build this technology now that wouldn’t have been feasible even a few years ago?
[0:16:08] NJM: Yeah, absolutely. We use a huge amount of deep neural networks. Even if they were around from the 40s and 50s, the real spread and the advancement of how fast we can train and how fast we can actually process this amount of data sets, I think is quite new. It’s from the 2010 going. Yeah, we definitely benefit from computational power and optimization of algorithms.
[0:16:39] HC: Talking more broadly about what you’re tackling and its importance in the world, how do you measure the impact of the technology?
[0:16:46] NJM: Yeah. So, for us, actually, the whole ecosystem of what’s going on right now. I am using ecosystem in two senses. At the moment, all companies, most of the companies will have to report on their sustainability strategies in the world to keep on functioning, to keep on operations, to be in business, right? In that context, what we can do here is make biodiversity monitoring achievable and democratically easy to access.
In terms of right now, if you think about how people operate and how they report on supply chains and ecosystem conditions in any company that has anything to do with nature, you will see it’s a manual process mostly. It demands human beings that are specifically trained to detect species from plants to animals to insects that have to go to the ground. You have to go four times a year to be able to collect information of species that are present along the four seasons if that happens.
All of this is manually intensive, expensive, and takes a huge amount of time, but the worst is that it’s not real-time data. When you put all that together, it’s probably already too late and things have changed and moved very quickly, because of many other variables like the anthropological impact that we have. The company is operating now to facilitate and to establish methodologies of biodiversity monitoring that use AI to make things faster and easier.
The impact for me will be huge to be honest. If we install and if we help creating biodiversity monitoring methods that are used for the regulatory processes that are coming now, like CSRD or CNFT, which are two frameworks for reporting nature, I think we’ll see a change in how biodiversity is affected by companies. I also think that, given the system pressures, if you think about how fast the planet is warming and no one really understands why, and what’s the ecosystem reaction to that? This is crucial. The fact that we can have a quick model and we can have almost real-time data, because imagine we have satellite data reading this daily, will be super, super important for everything from supply chains to food supply chains. I think, yeah, that’s the impact of this technology.
[0:19:18] HC: Is there any advice you could offer to other leaders of AI powered startups?
[0:19:22] NJM: I think given that’s a very fast-moving field, the fact that all of us are trying to keep updated, it’s already, that’s the main thing, like how do we keep updated? What’s important and what’s not? That’s been the question I’ve been asking myself. What’s actually important, what’s common sense and what actually really, really it’s deep down skill. I found that I deeply value the foundational aspects of my education, because by studying general relativity and quantum mechanics and also applying that to chemical evolution, I find that I can just learn anything and it’s not counting. I find it interesting.
Obviously, biodiversity is super complex. It’s a non-linear relationship and it’s a huge amount of stats, so it’s super interesting to see how these very foundational, as I said, foundational skills helped me out today. But I think common sense, maybe everyone will agree that we need to be solving real urgent problems. I would assure you I’ve seen a huge amount of AI startups solving problems that I cannot even believe they post. They are actually something to be spending time on.
I will ask everyone to move their focus toward what actually we have in front, which is the climate’s biodiverse to break down. I think that’s where we should be investing a huge amount of time and also resilience and creating, yeah, livable cities for what’s coming. I think also, it’s super important that we address that we need to be transparent and we need to focus on ethics as a huge amount of risk on not having accountability of what we are doing with the algorithms. I think that’s a very well-known thing in the industry.
The third thing in my experience, the most important thing I’ve seen is the success of AI startup, it’s really much tied up to the diverse, strong teams you build. I am passionate about that, bringing people from different backgrounds. Yeah, fostering people that are curious and willing to learn. Yeah, and I think that’s what I think is super important as well.
[0:21:35] HC: We’ve already talked a fair amount about impact, but just to close things out, where do you see the impact of NatureMetrics in three to five years?
[0:21:43] NJM: Yeah. I think I pointed probably a little bit on the other questions you asked, because I think if given that we have this regulation for all these companies and companies are asking, how do we do this? How do we measure species richness, ecosystem conditions? How do we give real-time information to companies to be able to change their operations, understand what makes them more resilient to climate and biological changes, basically? It’s super important. I think that’s the impact. I think these biodiversity monitoring methods and platforms we are building are going to be the standardized thing we’ll see across many, many industries in the three to five years coming.
[0:22:26] HC: This has been great. Noelia. I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?
[0:22:34] NJM: You can find me on LinkedIn, obviously. You can find all the information about the company. We have a web page, and yeah, that’s it. I don’t have Twitter or X, should I say it? I’ve been avoiding a lot, making any statements on platforms, but yeah, I’m very reachable through either LinkedIn or the web page of the company. If you need to talk to me personally, I think you can very easily ask for my details at the company’s contact forum.
[0:23:03] HC: I’ll link to both LinkedIn and NatureMetrics’ website in the show now, so that’ll be accessible to everybody. Thanks for joining me today.
[0:23:10] NJM: Thank you so much, Heather. This has been great.
[0:23:12] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I hope you join me again next time for Impact AI.
[END OF INTERVIEW]
[0:23:22] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend. If you’d like to learn more about computer vision applications for people and planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.
[END]