The impact of AI knows no bounds. Today, I am joined by Subit Chakrabarti, Vice President of Technology at Floodbase, a mission-driven, machine-learning-powered company specializing in flood monitoring and insurance. Having grown up in Eastern India, he knows the importance of adapting to global flood risk first-hand.
In this episode, Subit shares insights on how Floodbase utilizes advanced AI and diverse satellite imagery to support the design of parametric flood insurance solutions. We discover how machine learning plays a crucial role in analyzing vast datasets and bridging the insurance gap for regions vulnerable to flooding. Join us as we explore the transformative potential of Floodbase's technology and its vision for a more secure and equitable future, in the context of global warming and the associated global flood risk.
- Introducing Subit Chakrabarti, Vice President of Technology at Floodbase.
- Subit's background: what led him to Floodbase.
- Insight into Floodbase and its focus on parametric flood insurance and disaster response.
- Subit explains parametric insurance.
- How Floodbase uses machine learning to design the index for parametric insurance.
- Their use of satellite imagery and other geospatial datasets in setting up their ML models.
- The challenges of working with such diverse data sets.
- What made it possible to build Floodbase's technology (spoiler alert: advanced AI).
- How Floodbase measures its impact.
- Subit’s advice for AI-powered startup leaders: address bias and build a skilled AI team.
- Floodbase’s three to five-year plan: make insurance more accessible.
“Adapting to global flood risk is something that is near and dear to my heart, having grown up in India and having seen a lot of damage from floods in Eastern India where I used to live.” — Subit Chakrabarti
“Parametric insurance pays out when a pre-agreed weather condition is made separate from the physical damage.” — Subit Chakrabarti
“What we use machine learning for is to design [the] index that the parametric insurance can be based on, and that is our proprietary AI technology.” — Subit Chakrabarti
“One of the most important challenges with satellite imagery is that satellite imagery represents the condition of a place at a certain point in time and it’s not the continuous movement of what that flood looks like at that place.” — Subit Chakrabarti
“Our policy at Floodbase is that we add more data to remove bias from the process.” — Subit Chakrabarti
“The biggest thing that we can measure is the flood protection gap. So like I said, 83% of losses are uninsured and we can measure that.” — Subit Chakrabarti
Subit Chakrabarti on LinkedIn
Subit Chakrabarti on Twitter
[00:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine-learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people in planetary health. You can sign up at pixelscientia.com/newsletter.
[0:00:34.4] HC: Today I’m joined by guest, Subit Chakrabarti, Vice President of Technology at Floodbase, to talk about flood monitoring. Subit, welcome to the show.
[0:00:42.5] SC: Thanks for having me, great to be here.
[0:00:44.5] HC: Subit, could you share a bit about your background and how that led you to Floodbase?
[0:00:48.5] SC: Yes, of course. So my background is in electrical engineering and signal processing. I started my career in the PVC program, working on just real estate and processing and machine learning over items for various kinds of signals and there was an opportunity to be looking at satellite imagery and as a huge fan of NASA and space agencies around the world, I jumped at that opportunity and worked on machine learning for them for satellite imagery in general and specifically, images to increase the spatial resolution of satellite imagery using advanced machine learning methods.
After my Ph.D., I wanted to continue in the field. Originally, I wanted to continue in academia but felt a little burned out at the end of my Ph.D. and wanted to test the waters in industry and I had the opportunity to work at a small startup company doing crop type mapping and land cover mapping and mapping kind of like the food supply that we all kind of depend on and that is an interesting opportunity and I fell in love with kind of the fast pace of startups and continued working in that field until about two years ago when I decided to ditch a little bit and work on disaster risk and mitigating disaster risk and that brought me to Floodbase.
[0:02:19.7] HC: So what does Floodbase do and why is this important in adapting to climate change?
[0:02:24.5] SC: So landslides is the most common and costly natural disasters on earth. 83% of lower-end economic classes, flooding was uninsured over the last decade, and adapting to global flood risk is something that is near and dear to my heart, having kind of grown up in India and having seen a lot of damage from floods in Eastern India where I used to live.
And we need newer, better insurance particularly but risk transfer methods generally to adapt to the increasing magnitude and frequency of disasters and Floodbase is an end-to-end flood data solution, it enables this new kind of insurance, which you know it’s about parametric flood insurance and more efficient disaster response.
[0:03:11.5] HC: And what role does machine learning play in this technology?
[0:03:13.5] SC: So let me tell you a little bit about parametric insurance. So parametric insurance pays out when a pre-agreed of weather condition is made kind of separate from the physical damage. So you can imagine my parametric wind insurance will be there if there is miles per hour winds, you know, then our policy automatically gets paid out.
Why this is important is because in damage insurance policies, which are kind of like our traditional insurance policies are designed, take a lot of time to pay out and require a lot of money to administer because you need adjustors and verifiers, and after a catastrophe like a hurricane or you know, like any kind of flooding, you need to capital fast to recover from that catastrophic event and traditional insurance policies are usually not fast enough to cover that.
Parametric insurance on the other hand, because they track a parameter over time and you know, payout immediately after the parameter is met or exceeded is very efficient in terms of paying out. So that’s what Floodbase focuses on and we use satellite imagery in general and other kinds of geospatial datasets to power that, and what we use machine learning for is to kind of design that index, that you know, is what the parametric insurance can be based on and that is kind of like our proprietary AI technology.
[0:04:46.0] HC: So how do you set up these models? Is it you know, for a particular location and point and time from the satellite imagery you are predicting whether there was flooding there or is there another way to set this up?
[0:04:57.1] SC: Yes, exactly. So that is a kind of like generalized model that we use. So we don’t do it for a location in particular but we do it you know, across the globe but we have a lot of diversity in our creating datasets to accommodate all kinds of different locations.
We do have different models for inside the US and outside the US because the US is particularly data-rich region, where we can leverage a lot of different kinds of data but in general, I think, you know, you can think of the model as having satellite imagery and other geospatial datasets as input and then flood map as our power cord.
So that’s the first stage, and then the second stage converts that flood map into an index for a specific region or domain that you could kind of look at throughout history to construct an index on.
[0:05:51.9] HC: And usually those pieces uses machine learning?
[0:05:54.4] SC: Each of those pieces uses machine learning to some extent. So the first step, which is the conversion of various sources of satellite imagery and other geospatial data into a flood map uses a certain kind of you know, state-of-the-art big running technology, and then the step after that, we’ve conversed of flood mapping into an index which you know, you’re going to underwrite an insurance policy. That uses machine learning but it’s just advanced statistics assumption.
[0:06:25.2] HC: You mentioned satellite imagery. Could you tell me a bit more about the types of satellite imagery we work with and any other data sources that go into building these models?
[0:06:34.4] SC: Yes, of course. So we use all kinds of satellite imagery. I can name a few that is particularly important to our technique. So importantly, I think like, you can break up satellite imagery or earth-absorbing satellite imagery to be specific, into all sorts of different parts. In my mental map than what was the important part is public satellite imagery, which is you know, publicly controlled satellites that were launched by NASA or the New York Space Agency, or the Japanese Aerospace Agency versus privately operated satellites that are launched by the private companies.
The role of public satellite imagery is that they go back pretty far in time. So NASA launched the first earth-observing satellite in actually like the early 70s for various purposes. So because we utilize that dataset, we can kind of go back pretty far in time to produce flood maps, which is very important if you want to quantify the risk of place you input it and not just weather or places that have at a given moment in time.
If you don’t have both of those things then you can maybe kind of create a flood map but you can’t write an insurance policy, you need to understand the risk as well. So using public imagery is super important for us because you know, it goes back kind of like pretty far in time. That is not to say that we don’t use flood line imagery. So we use imagery from our partners at Planet Labs who operate as former part.
The conservation of satellites and cut that with space and UMBRA and Satellogic as well and the role of those satellites is to complement our public satellite data. The private satellites usually are higher resolution so they give you a finer perspective on what is happening on the ground, which is important if you want to do flood mapping for example, like the property level rather than the regionals scale and we also use different wavelengths of data.
So the first way to break up Earth observing satellite imagery is whether they’re publicly operated or privately owned. The second way is you know, what kind of wavelength of light their sensors are sensitive to. The three kind of big categories there are optical and near-infrared, which is you know, what we see as humans
The second is thermal infrared, which is basically how hot something is like you know, the temperature and the third is microwave. So for microwaves, satellites image and sends their radiation. All of them play a different role, the very important thing about microwaves for example is that they can penetrate through the clouds whereas, optical imagery obviously, if there is a cloud on top of something, you can’t really see what’s underneath.
So that’s why we use kind of the post sweep of imagery available from all satellites and they each have their kind of like own interesting you know, artic wave in our own gardens. Apart from satellite imagery, we also use some models of how water flows. So these models are operated by – in the US by NORA for example and they give you the state of every river and every kind of like tight gauge operating on the coast at all times and what we use that for is to complement the satellite imagery we already have and make it more continuous.
[0:10:15.0] HC: With the different types of satellite imagery, the different wavelengths, and different spatial resolutions, do you need to train different models for different satellites and parts of the spectrum, or I there a multimodal way to bring them together?
[0:10:28.9] SC: So that’s a very interesting question, first of all. So like I said, there’s three different classes of satellites, there’s optical, your infrared that’s like, basically in one class. There’s thermal infrared, which is the second class and there is microwave, active and passive microwave, which is the third class and we train [inaudible 0:10:48.8] for each sensor independently at this point but I think like you know, with the goal of the RND team that that brings us to, is to create a marketing model solution.
[0:11:00.2] HC: What kinds of challenges do you encounter in working with these diverse types of satellite imagery and your other data sources?
[0:11:07.7] SC: So one of the kind of like most important challenges with satellite imagery is that satellite imagery represents the condition of a place at a certain point in time and it’s not kind of the continuous movement of what that flood looks like at that place. So for example, if you consider the NASA land set satellite, they usually overpass in the afternoon at some point every kind of seven, maybe seven to 10 days.
So between those overpasses, if you don’t have kind of like additional sources of data, it’s very difficult to tell by their places whether or not and specifically for insurance or you know, even for disaster response, you kind of need it continuously into, you know, what innovation looks like at a given place. So if you just rely on satellite imagery, you know that is kind of the missing piece of the puzzle, which is why we lose kind of these other continuous US geospatial data sources.
That’s one thing and then there’s other aspects like you know, cloud cover like I said earlier. Optical imagery. It’s very – it’s relatively straightforward to design models for optical imagery because you know, we as humans are kind of like majorly trained, you can say, to kind of recognize water in optical imagery so we can create kind of like you know, big data sets to train these models on.
But clouds represent a problem, especially in tropical regions, where during monsoon so you can have cloud cover, you know, placed for days. On the other side in micro imagery, where which has the advantage of being able to see through clouds, the problem is that it’s very hard to interpret it, that imagery and has been the domain or kind of like national instances as far as like you know, the past few decades.
They’re what you’re looking at is not actually an image per se but a wave and you have to convert that into an image before you can visualize it, which results in a bunch of problems kind of interpreting that imagery. So you know on the interpretation scale, you can say optical imagery is easier to interpret but you know has gaps, there is microwave imagery is harder to interpret but it doesn’t have any gaps.
So how we solve that challenge is essentially as we were talking about creating a multi-model solution and you can create this multi-model solution at different stages of the process, right? So you can create the model type, you know you can integrate different kinds of satellites at the very end when you’re creating the index or you can integrate various satellites at the beginning of the process when you are creating the flood map and the – start with kind of like integrating it towards the end of the process and are now kind of lift up the chain and doing the integration and at an earlier and earlier stage, which is good because the earlier you do that integration, the less information you expect this.
You can imagine that as you go kind of like further from a raw image to an index, you know, at each step you lose some information.
[0:14:22.6] HC: Is bias a concern for flood models and if so, how might it manifest?
[0:14:26.9] SC: Yeah, again a great question. So insurance and disaster response, it’s important to take into account that there is already existing data that is used to kind of create policies and prioritize different areas and things like that and they have a lot of bias that is kind of like built-in due to the politics that disaster response latches in. So our kind of policy at Floodbase is that we add more data to remove bias from the process.
In general, I think you know, if you look at more data and if you kind of like derive your policy and your statistics on a larger dataset that always kind of removes bias than adding bias and revert with folks like FEMA to make kind of like better maps and remove biases in those maps.
[0:15:20.6] HC: So a larger dataset is really the solution you’re focused on. Did it focus on more data-specific areas or is it just generally a larger data set can reduce the bias?
[0:15:32.0] SC: Yes. Again, that’s a very interesting question. This is something that I was also focused on a lot when I am doing my Ph.D. So initially if you train a machine learning model and with overtime on a specific dataset, interpolate within that dataset and try to extrapolate the middle bit in the dataset but it cannot fundamentally construct new things that with, you know, you don’t show it in that dataset.
So for example, if you consider a hypothetical scenario, where you train a flood mapping of origin to predict flooding and you only show it you know, wetlands or you know temperate forest, and then you ask it to generalize in the city like you get answers that are completely wrong. So what we focus on is creating, creating a dataset that is as diverse as possible both on kind of like you know, what is the representation of an open area that you’re showing with versus what biome does this state that belongs versus even kind of like what kind of latitude are you looking at.
The other important thing is altitude or slope. So once we can just stratify enough on all of these access, we also have held out validation datasets that are similarly diverse so that we can surface any bias and give origin as soon as possible before we get on to kind of index creation step and recognize it and address it by you know, adding data in those locations. So that is what I was talking about if that makes sense.
[0:17:06.8] HC: All right, so such as larger is more diverse and diverse with respect to the axis that you mentioned there.
[0:17:12.6] SC: Yeah. I mean, in general, bias surfaces in any kind of statistical, you know like extrapolation of archives. When the statistics of your training set is very different from just the vision set, your validation set, or the logistics of your testing set but what we try to do is to make sure that that doesn’t happen. Another way which bias surfaces in all sorts of kind of like you know, when you try to build a machine learning model that to extrapolate a national process is because you train it on the past and you’re trying to predict the present.
Because of climate change, the statistics of the past are not very accurate when you’re trying to predict those statistics of the present. That’s the other thing that we use kind of like various statistical techniques to solve.
[0:18:04.7] HC: That’s the latter one with the changes over time. That sounds like a much more difficult one to solve.
[0:18:11.1] SC: That is very difficult to solve and Floodbase is only deals with the present, right? So we value what the flood looks like at this point. We don’t do any predictions onto the future and the future part is very – it gets really purview and was creating much of an unsolved problem but for what we do right now, we just like wait times that are nearer to us, higher than the past and that kind of solves a lot of the issues within the product.
You know, this is like and technically a stationary problem like the distribution we’re sampling for it is not stationary and the way that we attempt to solve that kind of monitoring problem is to weigh kind of the last month higher than we would weigh 1979.
[0:18:59.5] HC: That makes sense. So why is now the right time to build this technology? Are there any specific technological advances that made it possible to do this now when it would have been feasible a few years ago?
[0:19:10.8] SC: Yeah, I mean, the big one is really the plethora of satellite observations available and kind of advanced AI and machine learning techniques that we can use to leverage the satellite observations. The progress in both just over my kind of like you know, short career has been tremendous and it continuous to kind of really exponentially grow and we get kind of like better and better satellites every year from private as well as out with you know, governments.
This is really an artist should do this and utilize imagery and you know use facial game sets. That is a skill that was not possible before. The third one is obviously competing. You know, our models are very heavy on our competing resources that they require to creating a run-in for in-stock and you know, the availability and cheapness of that convenience is the third that are quite that’s I think are more general across other technology sectors. The first two are why you’re currently making flat rates that exist today.
[0:20:15.1] HC: How do you measure the impact of your technology to be sure that you’re accomplishing what you set out to do?
[0:20:20.4] SC: Yeah, so I think we are very kind of like an impact-driven company in general. If you go to www.floodbase.com that will be super clear. So the biggest thing that we can measure is the flood protection gap. So like I said, you know, 83% of losses are uninsured and we can measure that to kind of like you know, the effectiveness of the policies that we power and that is kind of a big one for us.
The other thing that we do is the work to get to make it delivered faster. So you know, governments across the world rely on our data to provide a sort of basis that are most affected by the flood and we can measure kind of like you know, how effective that is, and those are kind of really like the two big pillars of what we measure to quantify the impact of our technology and then on the technical side like flood mapping accuracies and the [inaudible 0:21:17.4] is important.
[0:21:18.7] HC: Is there any advice you could offer to other leaders of AI-powered startups?
[0:21:22.7] SC: Yes. So I think the first one is to really kind of go back to your question that you asked, which is think about the bias, which I think especially if you are kind of using models from hugging phase or something that is just created by someone else other than you. It’s very easy to forget that just the way in which those models were created will result in some bias and quantifying the bias before operation and reusing those models is paramount, let me say.
Trying to kind of like determine the bias of those models in a variety of ways simulating in what they call they would be used in similar worlds is super important and something that you know, the closer the application is to kind of like real-life human situations, the more important it is that you are kind of absolutely sure that you’re addressing whatever bias exists in those models. So that is number one.
Number two is, you know, just like building a team of AI scientists is pretty challenging. I think there’s good and bad things about AI, which is that there’s a lot more people who are paving this avid profession and that are interested in this field, especially I would say there is kind of maybe before and after you know, these GPT 5 or RGB 4.5 facilities. Experience with ChatGPT has driven awareness of AI kind of through the roof.
Which also means that selecting the right people on your team is hard because we’ve directed us into the applications. So I would say you know, being very mindful of hiring the right people on your team, which just means that you have to pay a lot of attention to recruiting and then you know, making sure that the people you do recruit upscale every year because there’s kind of making new developments in the space that everyone needs to be aware of.
So making sure they attend conferences, they listen to podcasts like the Impact AI podcast and to kind of keep up with the developments of the industry.
[0:23:30.8] HC: Finally, where do you see the impact of Floodbase in three to five years?
[0:23:34.3] SC: What I would really like if you know, in three to five years, I could come back and say actually the gap between you know, the people who are insured and not has closed significantly. If that is true, then I would say that we have been pretty successful. That’s number one and number two is like I said, you know in the data-rich countries like the US, we have our importance that visualize a lot of different datasets that we can offer in countries without so much data.
I would like to see that gap close as well, so I would like to be in a place where we can offer like the same level of importance for all over the world, whether you come from the US or Western Europe or other parts, which does encouraging that as much data available for those been.
[0:24:22.1] HC: This has been great. Subit, your team at Floodbase is doing some really important work for flood insurance. I expect that the insights you’ve shared will be valuable to other AI companies. Where can people find out more about you online?
[0:24:32.9] SC: I have a LinkedIn account and a Twitter account and I would say go to floodbase.com and check out all the brilliant things we do as a company and we also have a lot of publications of that kind of going into lots more details about how our things are developed and implemented and if there’s questions that cannot be addressed by any of those resources, feel free to email me at [email protected].
[0:24:59.0] HC: Perfect, I’ll link to those in the show notes. Thanks for joining me today.
[0:25:02.5] SC: Thanks for having me, this has been a lot of fun.
[0:25:04.4] HC: All right everyone, thanks for listening. I’m Heather Couture and I hope you join me again next time for Impact AI.
[END OF INTERVIEW]
[0:25:14.3] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend. And if you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.