Revitalizing Forests with Guy Bayes from Vibrant Planet

Machine learning can be used as an innovative method to contribute to climate change resiliency. Today on Impact AI, I am joined by the co-founder and CTO of Vibrant Planet, Guy Bayes, to discuss how they are using AI to revitalize forests. Listening in, you’ll hear all about our guest’s background, why he started Vibrant Planet, what the company does, how they apply machine learning to their work, and a breakdown of how they collect the four sets of data they need. We delve into any problem areas they face in their individual and integrated data types before Guy tells us how they cross-validate their models. We even talk about how the teams collaborate, how machine learning and forest knowledge come together, and where he sees the company in the next three to five years. Finally, our guest shares some pearls of wisdom for any leaders of AI-powered startups.

Key Points:

A warm introduction to today’s guest, Guy Bayes.
Guy tells us about his background and what led him to create Vibrant Planet.
What Vibrant Planet does and how it contributes to climate change resiliency.
How Vibrant Planet applies machine learning to the work.
A breakdown of the four sets of data they need and how they collect it.
The challenges they face when it comes to collecting and integrating all their data.
How Guy makes sure that their models work in different geographic regions.
Incorporating forest knowledge into data modeling and machine learning development.
How the Vibrant Planet teams work together and collaborate to achieve their goal.
What Vibrant Planet does to measure the impact of this technology.
New AI advancements Guy is particularly excited about for Vibrant Planet.
Guy shares some advice for leaders of AI-powered startups.
Where he sees Vibrant Planet’s impact in the next three to five years.

Quotes:

“Getting the forest back into a state that's more able to tolerate fire and more able to produce low-intensity fire rather than high-intensity fire is [Vibrant Planet’s] goal.” — Guy Bayes

“We have – not only super good engineers but also very talented ecological scientists and people that have done physical hands-on forestry for their careers. – This mix of those three personas – work together pretty harmoniously actually because we all share a common goal.” — Guy Bayes

“I don't think you can ever find one person who has all that in their head, but you can find a team that does.” — Guy Bayes

“You will not have an impact without having a combined team that all respects each other and brings different things to the table.” — Guy Bayes

Links:

Guy Bayes on LinkedIn
Vibrant Planet
Vibrant Planet on LinkedIn

Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.

Transcript:

[INTRO]

[00:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven machine learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people in planetary health. You can sign up at pixelscientia.com/newsletter.

[INTERVIEW]

[00:00:33] HC: Today, I’m joined by guest Guy Bayes, Co-Founder and CTO of Vibrant Planet to talk about land management. Guy, welcome to the show.

[00:00:41] GB: Yes. Thanks so much for having me, Heather. Really looking forward to the conversation.

[00:00:45] HC: Guy, could you share a bit about your background and how that led you to create Vibrant Planet?

[00:00:49] GB: Sure. I guess the story starts I’ve been a Silicon Valley technologist for many, many years. I actually started working in the data space in 1997, when I was in my graduate program. I was working on a system then which we would probably call a data warehouse now. Though, we didn’t really use that terminology. I’ve been kind of a data geek ever since. I worked across a bunch of different startups. I worked at Facebook during the early years of Facebook. I helped Lyft get through their IPO. I worked at Lawrence Livermore National Laboratories on big fusion lasers. It’s always been like a data theme throughout my career.

After the Lyft IPO, I was a little burned out, and I was taking some time off. I had this cabin up in the mountains in Southern Oregon, and a big fire came through. It missed me by about a mile, and it burned down two towns that were next door, including towns. Some of the people that I had, I was friends with got hit by that fire. Pretty soon, I had climate change refugees camping on my lawn and living in my spare bedrooms. I’m like, “Wow, this is really out of control.”

That whole experience was kind of formative for me, not only the severity of what happened but the speed. It was a lot like the Lahaina fire in a lot of ways. It ripped through really fast. It was a high wind day, unusually high wind day, very dry. We went from somebody flicked the cigarette butt onto the ground to like town’s on fire in a couple hours. When it was all over, I sat back and I said, “Well, I guess I know what my next project’s going to be.” There’s got to be a way to bring data and technology to bear on this crisis that’s happening and been happening across the Western US for years at that point.

My first thought was something around detection because if it hadn’t happened in the middle of the day, I think everybody would have died in their beds. It moved so quick. I had some ideas. I went, and I talked to one of my former product managers, Maria Tran, who’s worked with me on and off for years and years at different companies. She talked me out of that. She said, “You got to go talk to Allison Wolff and Scott Conway. The real game here is prevention.”

Being an engineer, I’ve always liked preventative solutions. It’s that saying that an ounce of prevention is worth a pound of cure. That’s kind of where Vibrant Planet started is what can we do to prevent these fires from ever even happening. Or at least prevent them from happening with the severity that they happen.

[00:03:09] HC: What does Vibrant Planet do today, and how is all of this so important for climate change resiliency?

[00:03:15] GB: If you think about why the fires occur, I don’t know if you’re a West Coaster or not, but anybody that lives in the West Coast of the US knows these things are happening like crazily. They occur really for two reasons. One of them is climate change that we’re getting hotter and drier and windier pretty much all over the place. It’s obviously bad for fires.

The other reason they are happening, though, is really more a manner of forest mismanagement. In the olden days, when the indigenous were here they would place control burns into the landscape constantly. You ended up with a forest that was very used to low-intensity fire, very comfortable with it. That forest looked quite a bit different from what it looks like now.

If you wander around in the woods of California or Oregon or Washington State, you’re not seeing a natural forest. You’re seeing a forest that has had natural fire suppress it for 100 years. As a result of that suppression, the forest has become very dense and very flammable. Instead of having a few big trees kind of dominating the area and lots of relatively open space, the trees are packed, and they’re young. You’re also seeing a whole lot of non-native species that were introduced mostly to make lumbering easier. So it’s a different forest. In the old days, you could drive a Jeep through the forest, and there was enough room. Now, you can’t even hardly walk through it.

Getting the forest back into a state that’s more able to tolerate fire and more able to produce low-intensity fire rather than high-intensity fire is the goal. There are mechanisms that we have to do that with. We have, firstly, controlled burns. That’s a really great way. Get fire back in the forest the way it’s supposed to be. Only do it in a way where everybody stays safe. There’s also ecological thinning. There’s lots of ways to get into the forest and do treatments that will reduce the chances of high-intensity fire. That’s happening. There’s a lot of work going on up and down the West Coast in Canada and lots of other places.

The problem we get into, though, is it’s expensive. It’s men and machines out doing stuff. Regardless of the fact that there’s a fair amount of money that’s been allocated to this, both from the Biden infrastructure build and lots of other places, there’s still never enough money to do everything you want. What we really need to do is optimize the spend of that money to figure out where are the treatments most effective, where we can accomplish the most with every single dollar. We call that restorative return on investment in Vibrant Planet. That turns out to be a big huge honk in machine learning and data problem, which is why I’m going to try to fix it.

[00:05:46] HC: Getting into the machine learning parts, that’s the part where I’m most interested here. How do you apply it? How do you study the forest? How do you figure out where it’s most applicable to do some of these solutions? [00:05:58] GB: Yes. It’s really an intersection of top-notch ecological science and computer science. If you think about the goal, my goal is I can hand my product to a land manager, and they can draw an area on that map that says this is the area that I’m considering treating. They can input this is how much dollars I have to spend on treatments. They can fiddle with some priorities because prioritization is always a thing, right? You want to figure out what you’re trying to accomplish exactly.

Then I spit out, okay, this is your optimum set of forest treatments that you can do. These are the 100 acres you should go do control burn on. These are the 200 acres you should do ecological thinning on. They know not only that that’s optimal, but we can track it over time, and come back and revisit it every now, and then see whether we’re actually getting the results out that we want. This goal is pretty straightforward, but to do it requires a lot of work. You were talking about AI and machine learning. There’s a fair amount of that going on. Really four sets of data and processes that have to come together in order to accomplish that goal.

The first set is we have to really understand the forest. We have to understand every canopy-dominant tree and its characteristics, its height, its density, guessing its species. If we’re going to do treatments, we have to know what we’re going to treat, right? To do that requires a lot of machine learning that’s mostly based around taking what’s called fixed-wing LiDAR and satellite imagery of various kinds. I don’t know if you’re too familiar with fixed-wing LiDARs. Probably worth digging into what that is exactly, like how the whole LiDAR world works here.

[00:07:37] HC: Yes. It sounds good. A brief overview would be great.

[00:07:40] GB: Okay. LiDAR is the same stuff you see on the top of self-driving cars. It’s basically you bounce laser off things, and you can figure out three-dimensional map based on that laser bouncing off of it. We do that to map the forest by attaching LiDAR to the bottom of planes. We fly the planes over any area, and you can get a three-dimensional representation of what’s under you. It’s awesome data. It’s pretty new being able to do this stuff. It’s leading to a lot of advances not just in force treatment but in archaeology. We’re finding all these Mayan ruins and stuff. It’s really cool.

The problem with fixed-wing LiDAR is it’s expensive to do that. You got to charter very special planes and arm them with very special instruments. Because it’s expensive, it doesn’t happen everywhere all the time, which means you end up with this sort of high-quality data set, but it’s kind of patchy. You have it in certain places and not in other places. Some of it’s old. Some of it’s new. If I had fixed-wing LiDAR data everywhere that’s up to date, I wouldn’t have to do anything else, and the problem would be easy. But because I don’t and I never will, I have to figure out a way to compensate for that.

Satellite imagery has kind of almost exact opposite properties of LiDAR. It’s everywhere, and it gets updated all the time. But it’s only two-dimensional, so you can’t really tell how tall the trees are. Because you can’t tell how tall the trees are, you can’t tell how responsive to fire they’re going to be. What we have done is trained a machine learning model that interprets that LiDAR as a training data set and then does inference against the satellite imagery. We can go then take a flat satellite image and figure out all the properties of trees that we need to know down to the meter level of resolution.

It’s really powerful, and it sits on the top of a lot of the machine learning advances that have been coming out of other industries like Dolly and ChatGPT and things like that. We’re using a lot of the same underlying technology that those people use.

[00:09:30] HC: You use the LiDAR to generate a high-quality localized data set. From that, I’m assuming correctly that that gives you labels to then apply to satellite imagery and train a model on satellite imagery that you can then apply more broadly.

[00:09:45] GB: Yes, exactly right. There’s a third set of data that we’ve just started using lately which is called GEDI. They actually hung the LiDAR off the International Space Station. So we can get LiDAR data from NASA via the International Space Station which has a whole another set of properties, which is it’s relatively current, but you only get it in strips, right? You only get it where the International Space Station flies over a particular area. That’s a second set of labels that we can incorporate into the training that we’ve been using to great success.

[00:10:15] HC: These combined give you different sources of labels for local regions wherever the data happens to be available. Then with satellite, you can apply it more broadly. From that, be able to predict things like tree species’ heights, that type of information.

[00:10:31] GB: Yes. Right now, we’ve got something like a half a billion trees that we’ve mapped that way pretty accurately. It’s going to get much bigger really quickly because we’re sort of on a roll. That’s the first set of data that you need in order to do this, but that’s still only one of four sets, right?

The second set you need to have is you need to know what are the things in the forest that you want to protect. What are the towns? What are the endangered species? Where are the power lines? Where the roads? Where the reservoirs? In a lot of – there are pieces of that which is almost more traditional data warehousing, where you just need to go out and harvest various sets of tabular data. There’s also places where they have their own little mini models trying to make sense of building the prints, and then classify those buildings into different kinds of structures, and assign value to them essentially. We have a whole bunch of that going on which is really one of the things that we value about the forest, right?

There’s this third set of data which is you got to understand what risk all these things are susceptible to. What’s the fire risk for a particular area? What’s the other associated risks to fire like landslides and things like that? In order to do that, we have a whole bunch of risk models that we run that are pretty sensitive to things like weather and past experiences. We get a really nice risk map that we can overlay across all these trees and all these structures and all these resources in order to understand what’s really most in danger.

Then the fourth piece of it is called response functions. This is an ecological construct where you have to know how a particular area is going to respond to a treatment, how much value you’re going to get out of a particular kind of treatment. We assign those response functions to all the different potential treatments across all the different potential areas that might be treated. Then we get the whole picture. Now, we know what’s out there. We know what risk it is. We know what’s valuable that we want to protect. We know how effective we’re going to be if we spend dollars to try to protect it. We take all that set of data together, and then we have to run real-time optimization as the user is clicking around on their map. They’re changing the numbers. They put in the dollars and whatnot. We have to be able to very quickly in real time do an optimization algorithm that tells them, okay, based on your new input, what are the highest level areas, most valuable areas for you to treat?

[00:12:53] HC: That’s a bunch of different models that need to come together in order to produce the results you need there.

[00:13:00] GB: I think that it’s generally a pattern that we expect to be able to follow in other industries as well. This idea of you understand the natural landscape. You understand what kind of risk it’s being subjected to. You understand what is the valuable things you want to protect. Then you have this equivalent of like, okay, how effective are my interventions going to be? That can work for flooding. That can work for hurricanes. That can work for all sorts of stuff, right? If anywhere where there’s a climate change kind of crisis happening where people have an intervention, a thing that they want to do to try to offset it, this pattern will satisfy that.

[00:13:39] HC: With all these different types of data that you work with, what are some of the challenges that come up? Are they more related to challenges with individual data types? Or is it the integration that’s more challenging?

[00:13:49] GB: I’d say yes to all of the above. The whole thing is pretty hard. Some of the individual models are really tricky to get to work and then really tricky to get to work in a cost-effective way. I think about it in terms of sort of [inaudible 00:14:02]. I think about quality. Can you get the thing to be high enough quality that you want it to be? I think about it in terms of a recency [inaudible 00:14:13]. Can I get the thing to be up to date enough that I’m seeing a good enough image of what reality currently is?

I think about it as cost and performance. Can I – okay, I have a high-quality model that doesn’t help me if I can’t afford to run it, right? It would cost me a billion dollars to run the thing. It’s not useful. Then I also think about, like you said, the integration because a lot of times, you’re reducing errors in various places. Then when you pull it all together, you have sort of a compounding error effect that you have to be aware of. [00:14:42] HC: The models that you’ve trained with LiDAR, we talked about how they’re only available in certain regions or certain points in time. But then you apply it to satellite imagery more broadly with all of these models. How do you make sure that these models work in different geographic regions? Maybe it’s somewhere that you haven’t collected LiDAR data from.

[00:15:00] GB: Yes, absolutely. Two approaches to that. The first one is we have a test and control across all the LiDAR data that’s out there. So we have holdouts essentially that are designed to be representative of all the different ecologies of the LiDAR data. Then we can say that LiDAR data is not used to train the model, but it will be used to validate the performance of the model, right? We spent a lot of time, the scientists have, picking these hold-out areas to try to get a big swath of all the different things, all the different ecological and regions across Western US.

The second thing we do is we do physical validation. We have this concept of plots where somebody will physically go out to a part of the forest with a LiDAR backpack, essentially, and a measuring tape and a bunch of other instruments. They’ll run around an acre or two and just actually measure everything. Figure out exactly what trees are there, exactly what species they are, exactly how tall they are.

We started doing some of that plot validation with drones as well. We found that drones can speed it up, and it’s still relatively small areas. The downside of drones doesn’t get in the way of collecting data for that. Then when we’re all done, we have this thing that validates. We call it the [inaudible 00:16:15] framework. That takes all of the inputs and figures out whether we’re producing enough accuracy to call the product [inaudible 00:16:21].

[00:16:22] HC: So careful cross-validation, making sure that the way you split your data enables you to test whether your model generalizes to other areas, and then external validation, gathering data by other means in order to validate it.

[00:16:36] GB: I guess the last part of validation, which is probably in some ways the most useful, is just the customers. A lot of the people that we work with are very familiar with the land under their management that they’ve been working it for years, and they walked every inch of it. So they’re also able to tell us when things don’t look right, or we’re not doing something that that doesn’t match their sniff test. We are all always are very careful to listen to that.

[00:17:01] HC: That expertise that you mentioned, I imagine that’s very important not just in validating your models but during development. The standard machine learning engineer probably doesn’t know a lot about forestry coming into this. So I imagine they would learn quite a bit in working with your team. How do you incorporate this knowledge into the data modeling and the machine learning development in order to get the most effective models?

[00:17:24] GB: Yes. I think that’s actually one of our sort of secret sauces is we have on staff, inside the company not only super good engineers but also very talented ecological scientists and people that have done physical hands-on forestry for their careers, right? We have this mix of those three personas that work together to produce this stuff and work together pretty harmoniously actually because we all share a common goal. I don’t think you can ever find one person that has all that in their head, but you can find a team that does.

Then the second thing we do is, like I said, we’re very careful when we partner with folks to be very open to feedback, very open to like, “This is where I don’t think your data is accurate,” and be able to follow up on that. I do think that all of these ecological problems, one of the failure states is that oftentimes Silicon Valley rolls in and thinks we’re all smart and we can just fix everything. The reality is that it’s complicated, and there are folks that have been working on this their entire lives. You will not have an impact without having a combined team that all respects each other and brings different things to the table.

[00:18:32] HC: What level of interaction do those different expertise types? Are they interacting on a daily basis, on a weekly? Are they touching base in maybe a daily standup? Or exactly how closely do they work together in order to accomplish this?

[00:18:48] GB: Very closely. They are cross-functional teams that generally have representatives from each one of those three disciplines on them pretty much whenever we do anything. Then we also have a pretty formalized quality assurance process where we pull in like representatives of our customers as well. When we think we have something that’s at the point where it’s accurate, we don’t assume it is. We pull in folks from the customer side that can tell us how well it looks it’s working for them and where they think the problems might be.

Yes, I think about it as cross-functional teams. I think the more you put up these barriers between those different disciplines, the more you’re introducing risk. The more you get people just working side by side on stuff, the more successful you’ll be.

[00:19:27] HC: Yes. I see that in a lot of different application areas; machine learning, medical, forestry. Pretty much anything where you really need that domain expertise, the domain experts need to be integrated with the technical team in order for this to really be effective.

[00:19:43] GB: I would say that the boundaries of expertise are even starting to blur, right? We have some scientists that are really good engineers. We had some engineers that know a fair amount of science. Even if you try to put up a wall, it doesn’t really match the way people actually function.

[00:19:58] HC: Thinking more broadly about what you’re building and what your goals are, how do you measure the impact of this technology?

[00:20:04] GB: Yes. If you think about why do forest treatments matter, there’s direct and indirect benefits. The direct benefits are if you burn up a bunch of trees, you’re putting a bunch of CO2 in the atmosphere just flat out, right? You’re also preventing those forests from being able to sequester additional CO2. That is something we can measure and we can say this is the effect of a fire on CO2 directly.

There’s also a whole bunch of co-benefits. Forests are really important, for instance, for water. A really bad forest fire can be devastating to water supply, both directly by erosion into reservoirs and things like that, but also indirectly in the fire. The forest prevents snowpack from melting so fast. So if you don’t have a forest, your snow all melts, and then you run out of water. There’s all these co-benefits that we take a crack at quantifying. Depending on what the co-benefits are, it can be harder or easier to really nail them down. But we have an opinion about most of them with some opinions about how accurate we are as well. So there’s the co-benefits. Then the third one is the direct impact on property. Things burn up. They cost money. So we can value all the human stuff with dollars and say like, “This is how much dollars we will prevent.” There’s also a lot of research that’s going on right now about the exact efficiency of forest treatments at preventing high-intensity fire. I don’t think we’re quite at the point where we can quantify that yet, but we’re getting close. There are a bunch of papers that have come out pretty recently. They’re pretty interesting to us.

[00:21:38] HC: Machine learning is advancing quite rapidly right now. There new advancements hitting the headlines more frequently than ever, as we’ve seen with ChatGPT and in the computer vision world as well. Are there any new developments in computer vision or even AI more broadly that you’re particularly excited about, especially for potential use cases for Vibrant Planet?

[00:21:59] GB: Yes. I think given that a lot of our stuff is built on the same backbones and the same infrastructure that a lot of the vision stuff across the technology industry is using, every time they make [inaudible 00:22:09] better, we benefit, right? Every time [inaudible 00:22:12] gets better, we benefit. Every time Onyx gets more efficient, we benefit. So we’re sort of riding that wave pretty directly and constantly keeping an eye on the new papers that are coming out and the new code releases that are landing on GitHub and directly incorporating it into what we do. That’s one piece of it.

Then the second one is entirely new techniques on doing stuff, right? They’re coming out of the scientific community as well. There’s all these papers that the ecologists are always writing, and they’re getting more and more sophisticated at using AI and all techniques in their research. There’s sort of like two streams of it that we can sit in the middle of and harvest all the really good stuff that other people are doing.

[00:22:52] HC: There’s a lot of research for you to follow then.

[00:22:54] GB: Yes. I mean, typically, yes, it’s like lots of papers are always landing. Then there will be spirited discussions about each one about whether there’s something valuable for us in there that we can make use of. Hopefully, we’ll be writing some of our own pretty soon, too. [00:23:08] HC: I look forward to seeing that. Is there any advice you could offer to other leaders of AI-powered startups?

[00:23:14] GB: Yes. I think that there’s a – none of these things are unique to me because other people have said them probably better than I’m going to say them. I think the first one is sort of do things that don’t scale. A lot of the science around this is not done in a scalable way, and it wasn’t built by engineers to run across the entire Western US. It was built by scientists to do a paper around maybe a few square miles or something like that. Don’t be afraid of that stuff because you can still get value out of that as you’re figuring out how to transform it into something that you can run across [inaudible 00:23:50] areas. I think that’s the first thing. Don’t be afraid of the things that don’t scale.

Then the second one is keep a very close eye, as we were talking about, on the whole revolution that’s happening, even if you think it has nothing to do with you. Some of the techniques that we’ve harvested are so far removed from even imagery that you wouldn’t normally think they would be useful. But it’s just such an explosion today that you never know what might come and happen. So definitely invest in that, keeping current on their research. Don’t be afraid to try new stuff, even if it doesn’t seem to be applicable to what you’re trying to do. That’s another piece.

Then the third one is a lot of this more cutting-edge stuff is pretty difficult to – it’s researchy and it kind of is resistant to very strong timelines. You’ve got to give some of your people the ability to explore these paths, even if you don’t know exactly how long it’s going to take. You have a piece of your business which is very software engineering, and we want to get a release out in June. Then you’ve got this piece of your business which is very research, and it’s like, “Huh, I wonder if this will even help us.”

You got to be comfortable with both of those worlds in order to make advances in using ML and AI. You try to shut down one of them or make one of them look like the other one. You’ll never going to get anywhere.

[00:25:05] HC: Finally, where do you see the impact of Vibrant Planet in three to five years? [00:25:09] GB: Yes. I’m hoping that in three to five years, we’re really turning a corner on forests in the Western US and probably in other Mediterranean climates. We’re starting to think about what’s the next thing we want to run after. Is it agriculture? Is it flooding and storms? Like I said, these techniques are more than just forestry. I think in three to five years, we’re going to be going after the next one or two of those.

[00:25:33] HC: Great. I look forward to seeing what you’re up to in a few years and following you along the way. This has been great, Guy. I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[00:25:46] GB: Vibrantplanet.net. That’s definitely where we are. Also, feel free to reach out on LinkedIn. We’re pretty active there as well.

[00:25:53] HC: Perfect. Thanks for joining me today.

[00:25:55] GB: Thanks so much for making the time, Heather.

[00:25:57] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I hope you join me again next time for Impact AI.

[OUTRO]

[00:26:07] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend. If you’d like to learn more about computer vision applications for people and planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]