Foundation Model Series: Better, Faster, Cheaper Earth Observation with Bruno Sánchez-Andrade Nuño from Clay

Can AI be applied to enhance geospatial data for climate, nature and people? This episode kicks off a miniseries about domain-specific foundation models. Following the trends in language processing, domain-specific foundation models are enabling new possibilities for a variety of applications, including Earth observation. During this conversation, I am joined by Bruno Sánchez-Andrade Nuño, Executive Director of Clay, a nonprofit organization harnessing the power of AI for satellite images, spatial data, and more. Bruno shares the functionality and concept behind Clay, and his journey to building it. He goes on to unpack the tool’s foundation model in broad strokes, before explaining why it’s important, and sharing the challenges he has faced along the way. We discuss the legal aspects of building Clay, and it’s primary goal to make it as easy as possible for any user to achieve their goals. We also touch on what the future might hold for Clay and the future of Earth observation. Thanks for listening!

Key Points:

Introducing guest, Bruno Sánchez-Andrade Nuño, Executive Director at Clay.
His journey from NASA astrophysicist to climate change, social development, and AI researcher.
What Clay focuses on: using remote sensing maps to interpret the Earth’s data.
The mechanics of how Clay is used and how different feature sets compare to one another.
A broad explanation of the tool’s foundation model and why it is quicker, cheaper, and more environmentally friendly.
Two main benefits of the tool that Bruno finds most exciting.
Data and infrastructure required to build Clay including 70 million satellite and aerial images.
Measuring what the model understands and the process of compressing an image into 700 numbers.
Privacy and intellectual property in the realm of satellite imaging and mapping.
What commercial imagery could add to the model and how it might be integrated in the future.
Clay’s partnerships with university and company groups
Why the focus of Clay is to make it as easy as possible for anyone to use the tool for anything they want to do.
Challenges encountered on the road to building Clay: explaining what it is.
The complexity of benchmarking foundation models and how this relates to Clay.
Working with partners to build Clay and the rest of the ecosystem.
Lessons from building Clay that may apply to other foundation models.
Bruno’s predictions for the future of foundation models and Clay.
What is certain about the future of Clay and our understanding of Earth.

Quotes:

“Clay is trying to figure out how to finally increase the adoption of remote sensing by leveraging a tool that itself is very complex, but the result of that tool is very easy to use.” — Bruno Sánchez-Andrade Nuño

“If you start with a foundational model that gets you most of the way there, [then] you can create those trials much quicker, much cheaper, and much more environmentally friendly.” — Bruno Sánchez-Andrade Nuño

“This is so new, we get the chance, those of us working on it, that we can save the whole industry, if you will, the whole space of AI for it.” — Bruno Sánchez-Andrade Nuño

“Clay, I believe, is not only the largest and most efficient model AI for Earth, for any kind of like foundational model. It is also completely open source.” — Bruno Sánchez-Andrade Nuño

“What we try to focus on is how can we make it as simple as possible for anyone anywhere to use this model for anything they want to do.” — Bruno Sánchez-Andrade Nuño

Links:

Bruno Sánchez-Andrade
Bruno Sánchez-Andrade Nuño on X
Bruno Sánchez-Andrade Nuño on LinkedIn
Clay
Clay on LinkedIn

Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.

Transcript:

[INTRODUCTION]

[0:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine-learning-powered company. This episode is part of a miniseries about foundation models. Really, I should say domain-specific foundation models. Following the trends in language processing, domain-specific foundation models are enabling new possibilities for a variety of applications with different types of data, not just text or images. In this series, I hope to shed light on this paradigm shift, including why it’s important, what the challenges are, how it impacts your business and where this trend is heading. Enjoy.

[EPISODE]

[0:00:49] HC: Today, I’m joined by guest, Bruno Sanchez, Executive Director of Clay, to talk about a foundation model for Earth. Bruno, welcome to the show.

[0:00:57] BS: Thank you, Heather, excited to be here.

[0:00:59] HC: Bruno, could you share a bit about your background and how that led you to start Clay.

[0:01:03] BS: My background is science, and it’s not a typical scientist. It has taken me a long time to realize what that background value is. That’s why I say it with more care. But the traditional answer to your question is, my background is in astrophysics. I studied physics, then astrophysics, PhD in solar physics, and postdoc with NASA, and then wanted to do more impact in society. So, I turned into climate change, social development, and AI. But the reason I was able to traverse that arc is because I, at the end, understand my background as a scientist, which is different as a researcher. I would like to reclaim what mean a scientist is, which is beyond research, looking at phenomena that are happening in society, understanding it, and then applying it to inspire to make it a better word. So, that’s my background, and what drives everything I’ve done. So, visualizing that.

[0:02:05] HC: So, what part of that led you to start Clay, and what are you doing at Clay today?

[0:02:10] BS: So, a lot of the things I’ve done is remote sensing maps. It came from not only my PhD was maps of the sun. We can’t really go to the sun, but we have really good images at very high resolution. So, that was my background, studying that kind of data. Then, when I started looking at Earth from satellites, it was the same, just looking from satellites to the sun or the ground, is looking from space to the ground. There’s so much we can understand from remote sensing.

Over the years, I realized that the promise of remote sensing, or what you can do is so much more than what is actually done. We literally have images of the whole of earth, or forest, of corals, of coasts, of events, and the sisters. But yet, it takes so much effort and time to understand what it says, and to count trees or to see the forestation. Something that technically, it’s solved in a way. When this AI revolution came, this new wave, deep learning, and ChatGPT is another, then, the basic question was, hey, maybe, maybe this architecture of AI has proven to be super effective in transcribing, translating, describing images, making images. All of those domains use pretty much the same architecture.

What would happen if we apply it to remote sensing, to understanding Earth’s data. That’s exactly what clay is. Clay is trying to figure out how to finally increase the adoption of remote sensing by leveraging a tool that itself is very complex, but the result of that tool is very easy to use. ChatGPT is a complex tool, but using ChatGPT is not. So, my hope is that when we do AI for Earth with Clay, the model itself is complex, but using it is not.

[0:04:13] HC: So, how do you use Clay?

[0:04:15] BS: Really good question. The core idea is that, basically, you stop thinking of Earth data and satellite images as images or as pixels. And you then lean on AI to then capture the content and learn those semantics. So, if you have to count trees or find, I don’t know, find lakes, you would literally visually or with programs try to find, with lakes, blue blobs, or whatever thing it is. Now, you lean on AI to say, “Hey, I’m looking for a lake,” and you tell the AI what a lake is, and then the AI looks for it. Where before, we have archives of pixels. Now, we have a librarian, or someone who has seen all of those pixels, and now understands not only the images, the contents of it. It sounds abstract, but that’s part of the problem. We need to completely rethink how we use your special data.

[0:05:18] HC: So, as a way to search for things on earth. If you can find an example of a lake, then you can find other examples, because they have a similar feature set?

[0:05:28] BS: Yes, on asteroids, ten times more. In the sense that I can give you examples of solar panels and find me more solar panels. I could also tell you, hey, this patch of forest is 10 tons of carbon. For that patch of forest, it’s 20 tons. So, giving you those examples how to convert that section or that part of the world into a number, a regression, then the AI can use those examples, figure out how to do that thing. The extra benefit of that is that, the process of doing that for the AI is almost universal to other things it needs to do anyways for other applications. What I mean by that is that, traditionally, if you want to find this carbon stock, or then, you have another problem, which is find lakes, I was saying before. Both processes tend to start both from start, from the image.

Just take an image and imagine an image had both the forest and the lake. You will need to start from a scratch for that image and do separate processes for each of these outputs. Whereas with AI, since we don’t start with pixels, we start with the digested model or digested output of the AI. It’s much closer to the answers of both outputs. We can talk about the resource for it, but the reality is that, we are seeing orders of magnitude, 10, 100, even those at 10,000 times faster calculations than with classical methods.

[0:07:02] HC: And the core technology behind this is a foundation model. For those who may not be too familiar with foundation models yet, could you explain this concept more broadly?

[0:07:11] BS: Yes. The foundational model, its linchpin, or is the center of what we’re doing, but it might not be for long. This field of AI is moving very quickly. A couple of years ago, a year ago, foundational model was thought to be the answer for all the problems. Now, we have seen that there are other ways to do it, like continuous training. We can talk about all these issues, all these concepts. But the core idea of a foundational model, which remains to be the thesis of Clay is that, if you give set of basic data that represents a set of questions, and instruments, and data, then you can work with that foundational or common model. Once you finish, you can then take that, and fine tune it, or continue training for a specific instrument, or for a specific area of the world.

What I mean by that, is that Clay is trained for the whole world with data from satellite, like Sentinel, which covers the world, or NAV which covers the US, or others. We train that, we work with the model. When we finish with that model, we know it’s good globally. We know it’s good with the instruments we train it with. Then, someone might say, “Hey, I really only want to use it for Spain, or for Brazil, or for deforestation, or for disasters, or for this commercial satellite.” They can take that common model, that foundational model, and then fine tune it only for the region they want or the instruments. So then, you create a child, if you will, of that model that is particularly good at specific tasks or specific region.

The beauty of that approach is that if you had to create a model for Brazil from scratch, or for Spain, or for deforestation, you’re multiplying by many times the amount of effort, amount of funding, the amount of emissions to train all of those models. Where if you start with a foundational model that get you most of the way there, it’s paying once. Then, you can create those trials much quicker, much cheaper, and much more environmentally friendly.

[0:09:28] HC: So sounds like there’s a few different benefits that you mentioned there. One is this large foundation model has a more complete descriptor to help you find anything on Earth. Two, it enables you to get started much faster on a new model for something more specific. Are there other benefits of foundation models that we should talk about here?

[0:09:49] BS: I feel there’s two kind of benefits. One is the ones we know, which is faster, cheaper, better, or less emissions. Those are the ones we talked about I just mentioned. But it’s also worth mentioning that there are a lot of things we don’t know, because this is a new space. We have never worked with something like the transformer or these types of AI with embeddings, which we can define in a second. But we don’t really know what these things can do that is completely out of our minds, because we are too used to doing geospatial and remote sensing with pixels. They’re too used to the kind of things we expect the data to do.

So, it’s hard to then talk that these are benefit, but to me, it is extremely exciting, because it’s new, it’s a new tool. It’s a new domain. There’s not only that many people working on that. So, it is a window of opportunity. That’s what I think is another benefit, is that it’s a window of opportunity to shape what we want do as a field. What kind of licenses we want to promote, what kind of methods, what kind of products, what kind of standards. This is so new, we get the chance, those of us working on it, that we can save the whole industry, if you will, the whole space of AI for it.

[0:11:10] HC: What does it take to build a model like Clay? What kind of data, how much infrastructure, what kind of models, what are the pieces that go into it? Are there are any numbers that are worth talking about here?

[0:11:21] BS: It’s funny because at the end of the day, yes, we can do those models. And yes, they are fancy. I’ll give you the numbers in a second, which are very impressive. A lot of big numbers everywhere. But then, the question is, so what? Okay. You spend all of that or ChatGPT. Yes, they spend billions of dollars. What’s the benefit? What excites me is the benefit of how it could be used for, and we mentioned some examples of finding lakes, or pollution of lakes, or roads. or disaster, or all of those things faster and cheaper to do that. But is it’s true that to create those is not easy, that it takes a lot of data and a lot of compute. That’s also why conditional model work, as we said before, because we trained once.

This specific case of Clay, we took 70 million satellite and plane images from around the world from 2018 to 2023, with the usual ones from NASA, like Landsat, or the European Space Agency, like Sentinel-2 and 1, so it’s visual and radar. The claims from the Department of Agriculture in the US, NAV, but also from LINZ in New Zealand. Basically, combination of resolutions from very low resolution, like perimeters per pixel, to very high of 30 centimeters or more. Then, once we have these 13 million square kilometers as a big numbers, like 2% of the Earth’s surface. We give it 4000 GPU hours.

So, a lot of GPUs for a lot of time seeing the data, trying to understand it. The way it works is that we mask out, we remove part of the image, and then we ask the model, “Hey, can you reconstruct it?” That’s kind of the basic task that it has to do. So, it has to do that basically 35 million times. So, it’s a lot of computing. There’s a lot of mind-numbing amount of data, but the idea is always the same. Which is, if I give you an image of a satellite, if I give you an image of any satellite, and you understand what’s there, and you understand that there is a concept, even though you might not have the label forest, but there’s a forest, and a house, and a lake, and a road. And whether it is a road and a house is probably – all those kinds of dynamics that we innately understand with our minds, then we teach a computer to do that. Because if we do, the computer can work 24 hours a day and can clone itself for a thousand times. So, if we are able to give it some intuition, like human intuition of the concepts, then we can scale that, scale that to do any scale.

[0:14:06] HC: How do you know whether the model understands those types of concepts?

[0:14:09] BS: That’s a good question. So, during training, there is a measure of, as I said before, basic model x the image, and then we take out 70% of it, we put it black, 70% of that image, in random places. Then, we also restrict, in a way the memory of the description of the image within the AI. You can describe the image as you want, or the 30% you see, but only in 700 numbers, not more. So, it has to compress all the information of the entire image in 700 numbers.

Then, we are just using those 700 numbers of 30% of the image. Figure out what the input image was. So, then, you compare the construction that it did to the input that it had. The difference is something you want to minimize. So, every time it sees an image, it sees what it was wrong. That’s the process, how it learns in our specific mode. There are other ways to do that, other losses, as it’s called loss. It’s a difference between what you had coming in and then what you reconstructed. But by doing that, then by definition, those few numbers, which is start embedding the summary of the image, it started to be an extremely smart compression, or semantic compression of what’s in the image. Because you have so few numbers that you cannot afford to say, “Hey, there’s green here, and green here, and green here.” You might better say, “It’s grass.”

The concept of grass, you see it everywhere in the world, and if I dedicate this number, or part of this number to the concept of grass, it’s much easier for me to make a summary of the image by attending to the semantics, not the pixels. Does that make sense?

[0:15:59] HC: It does. So, you mentioned a few different sources of satellite imagery that go into training Clay. I believe these are all public sources of imagery. Is that right?

[0:16:07] BS: Yes, that’s also on purpose, by design and by value. So, a lot of the models in other domains like text, or images, or videos, they have the really big problem of IP, of intellectual property, our privacy. That it’s not clear if you constrain with that data that is commercial or belongs to a user. If you do so, if you need their permission or not. Regardless of what the law says or what one company decides to do, we don’t have those problems with Earth data, or we have much less of those. Because we use only open data, which means that it’s data from these institutions like NASA, European Space Agency, which is tax paid, and they release the data openly for any use. That means that we don’t need to worry about the legal implications of training protected data.

That’s why we say that Clay, I believe, is not only the largest and most efficient model AI for Earth, for any kind of like foundational model. It is also completely open source, because the data, the code is available, completely open data, because we only use full open data sources. Also, the result, the weights, the training model, it’s also free since all of those challenges, we release it also completely open for any kind of use. It’s also something we can do because we are a nonprofit, so we don’t need to you need to get revenue, or we don’t need to get profit. We’re actually prevented from getting profit from this model.

[0:17:48] HC: Those open and public benefits definitely make this a whole lot easier for the community worldwide to be able to use this. So, I really appreciate that. But I am curious, do you think there is anything missing by not having commercial imagery. Are there sources of commercial imagery that could add something to this model, if someone were to be able to develop in the future?

[0:18:09] BS: Well, definitely. Yes, of course. There’s a lot to be gained, that’s why there’s a whole commercial sector. There is even a limitation for Landsat and NASA to make higher resolution, because it doesn’t want to compete with commercial high resolution. And commercial high resolution, it is a field because there are many things. Not only on resolution but also in currency, like how many images you get in LA, for example.

Let’s focus just on resolution. Of course, the model would be better at detecting super small semantics like how many lanes on the road you have, or what kind of field or crop is this, that you will only get with commercial providers. It doesn’t mean that it does not work. It will work with commercial, but it’s unclear to what degree the model by being trained only up to 30 centimeters, which by the way, it’s very high resolution. High resolution with that commercial, but only in New Zealand. It’s unclear how good it could be, but it’s also clear that it will be better if we trained it with commercial.

That’s why we also benefit of being a foundational model that is open, because then, a private company like Maxar, or Planet Labs, or Satellogic, they can take the base model out our Clay, and then add their commercial data, and then create a version of Clay that is better at high resolution. Or, if you don’t have satellite images, you have commercial data of biomass, as I was explaining before, how much carbon there is in a forest. You have examples not in the open, like we had to test the model. You have commercial data very high resolution and very high quality to test the model or even add that task as one of the learning tasks.

Before, I was mentioning that we train Clay to reconstruct the image, that’s the training. But it was also training to be, “Hey, you got to be really good at detecting biomass.” So, instead of reconstructing the image visually, I don’t care about that. I know the amount of biomass in these places, but would you get an input of an image. Your job is to be better at that task, so you can use commercial data for tasks or for images. So yes, of course, the model will be better if we did that, but I think it would make it a different beast, or a different tool, a different value that we can do now that is all open.

[0:20:42] HC: Does Clay make use of the different spectral bands available on some of the public satellites?

[0:20:46] BS: All of them. Yes. In all cases, we use all the bands that they have available.

[0:20:50] HC: There’s, maybe not a lot, but there are a few other foundation models that are being released by partnerships between university and company groups. Are there things that are different about clay than some of these other models that we should talk about?

[0:21:05] BS: It’s a tradeoff. This is how, if I find it, I’ll put it on a link, and then you can put it somewhere that there’s a list of around 60 if I remember correctly, 60 foundational models of Earth. So yes, there are tons of them, and they’re really – there are examples that are also like us, global, and also like live stream, completely open, and used operationally, which is kind of the metrics we think about. We don’t want to be demonstration of progress of technology. We don’t want to be research or academia. We want it to be used. We’re focused on that.

One of such models is from NASA and IBM. I think the Clark University was also involved [inaudible 0:21:49], which is also foundational model. I believe they’re now expanding the scope. But from the last time I looked at them, it was trained on Sentinel and Landsat, HLS, harmonized data sets only for the US. It would only work for those places. As I said, it’s a tradeoff. What we try to focus on is how can we make it as simple as possible for anyone anywhere to use this model for anything they want to do. In the case of prefeeder, since it is trained in the US, if they haven’t done that yet, even though I think they’re going to train globally. That means, we’re better in the US. So yeah, there is basically. Depending how you count, there is one Clay or 60 foundational models, and they are each better at a specific thing.

[0:22:43] HC: What are some of the challenges you’ve encountered in building Clay?

[0:22:46] BS: I think the biggest challenge by far is that it’s complex. It’s really hard to explain. It’s very hard to sell that this will make everything simpler when, one, most people don’t understand the complexity of classical geospatial. But two, you need something much more complex to make it simple. Like I was saying before, yes, ChatGPT is really hard on the inside, but using it is simpler.

Explaining that concept for AI for Earth, it’s really, really hard. Explaining some of the things we talked about, what is an embedding, what’s learning, what’s a foundational model, it’s all so new. It’s not only the general population, but even experts on the fields of remote sensing or experts in the fields of AI that need to work in this intersection. It’s hard. It’s hard because at the end of the day, it’s the question of, so what? Okay, you’re adding more complexities, so what? What does this mean?

To answer that question, you need to solve the hard questions of the known complexity that you’re adding, but then you need to deploy it. We need those partnerships, we need those examples to grow it out, and we have more and more examples of those. But that is certainly the hardest things. Of course, there are other challenging things about making Clay, but by far is explaining what it is.

[0:24:16] HC: How do you go about benchmarking it? A lot of the foundation models have been published at the end of the paper; they’ll compare with the alternatives on a set of tasks. How do you think about this for Clay, and how do you use this to provide evidence that Clay is going to be useful for other projects?

[0:24:32] BS: That is a good question. There are quite a few benchmarks. GEO-Bench is one of them. There is a bunch of benchmarks. We found that there is no standard, just like in text or images, a more standard benchmarking. There is no standard of benchmark for foundational models. Yes, of course, we could compare it to this ability to make land cover class, or the ability to create those biomass, or the ability to detect this or that.

Those would benchmark specific application, not the intrinsic or the explicit value of being a foundational model. Because a foundational model, by definition, should be able to perform set of tasks. This is why we wanted to create a working group which is now established to define that, define a set of tasks that would qualify to say, “Hey, this is not only good for these three or four tasks, but this qualifies to be a foundational model, and this is the result, so we can compare that.” And that working group is ongoing. Not only that, we also created a challenge where we started with a set of tasks that we thought were essentially easy, things like land classification, all of the stuff we’re talking about.

We are now hosted by European Space Agency and others. The idea is to give $10,000 of compute to the winner to promote that. The short answer to your question is that, it’s an open question. We don’t yet know how to benchmark the property of the enough foundational models. We know how to benchmark specific tasks, but if the goal of this model is to be useful for a lot of things, how to measure that, it’s much harder to define.

[0:26:20] HC: So, the part of the quest is to figure out how to benchmark it as you’re building it.

[0:26:25] BS: Yes, exactly. You need to make choices. The choice we made, which is a standard one, is this mask reconstruction. But we also thought about, and we will do it once the benchmark goes is that, as you reconstruct the image, you also test doing training continually of the tasks that the financial model is supposed to do. So, then, you don’t need to wait for the paper to see the comparison. Once we define the set of tasks, we basically add them for the training set. That is good, but there is also a problem. In the sense that, once a proxy becomes a metric, it ceases to be a good proxy. Because once you know that seven things define your fitness, you focus on those seven. But it doesn’t mean that those seven were perfect metrics. It means that there were characteristics of the kinds of things to think about. So yes, it’s an open question. It’s a very interesting question, but again, it’s a technical question. What I want is, am I being useful to someone doing their work?

[0:27:27] HC: So, you’ve mentioned one example of how you’re collaborating with others through the challenge and with benchmarking here. Are there other ways that you have partners that you’re working with to build Clay and the rest of the ecosystem?

[0:27:39] BS: We have a growing set of partners that are looking at Clay, as I was mentioning biomass. That’s the reason I was mentioning a lot, because we had a partner, CTrees, it’s another NGO that does those estimations of biomass. There’s other partners who are doing similar things in the commercial sector. There’s also partners on the journalism side that are interested in finding other things in maps, like roads, or cultivated crops or destruction afterwards, or swimming pools, stuff like that. For them, the tool like Clay can significantly speed up the work they can do. So yes, we have – even though it’s a hard balance, because we focus on making the common horizontal tool as best as possible. Of course, we need to ask ourselves the so what, and then go into the verticals. And go with the partners on the vertical and help them figure out how to use the model for actually doing something.

[0:28:44] HC: Are there any lessons you’ve learned in developing Clay that could be applied more broadly to other types of data, or for other organizations thinking of building a foundation model?

[0:28:53] BS: Quite a few of them. Because the hardest struggle, as I was playing is that, is that this is a completely different set of skills. Most organizations that we speak with do not have that combination of skills of geospatial and AI. So, it’s really it’s really hard to adopt the model itself. That’s why we started working a digested version, like a ChatGPT of open AI. GPT-Four is the model, then they made digested version of the model, which people can use, which is ChatGPT.

That is a lesson learned that, you don’t validate your theory of chains with something only, I don’t know, thousand people or 10,000 people in the world can work with. You have to reduce the barrier to use that, and this is especially hard with our partners, because they don’t typically have the skills to embrace both sides. So, you need to build trust that the AI model is doing what it says doing. It is performing, as you said, is performing, and they can skill up to understand [inaudible 0:30:00] how to work every step, but they can also get used to it by applying it. We are considering the expectation of what it means to work with your special data. It is a big change what we are talking about here. But I genuinely believe that the results in other fields and the early results on AI further demonstrate that it’s an effort worth taking.

[0:30:24] HC: What does the future of foundation models for geospatial data look like? What do you see is coming ahead, whether it’s with Clay or more broadly?

[0:30:32] BS: I hope so. I generally hope so. If I tell the CFO of any of the institutions that are listening to this, “Hey, I can give you a tool that makes the thing you’re doing a hundred times cheaper or a thousand times cheaper. Or, if you read the stories of models like Pango Weather, which is a foundational model of weather prediction, that is able to do the predictions which are more accurate and 10,000 times faster than traditional methods. You read that the team doing Pangu-Weather was four people, then anyone who is in the cost basis, all of these predators, like, we got to do these things. This is something we have to do.

But as I said before, the reality of implementing these changes, it takes a while. It takes a while to get used to it. It takes a while to build the tooling around it. So, I think it’s coming. There is no alternative that demonstrates the benefits that we are seeing right now, is just not. Not even – it fails in comparison. The reasons for not doing it are disappearing. The reasons that making a model is very expensive is disappearing because Clay exists and its alternative, or [inaudible 0:31:50] or other models. You don’t need to start from a scratch.

The reasons that no one else is doing it is also disappearing because there are more and more examples of people using it. The reasons that you’re going to trust it is disappearing, because other people are trusting it in ways that it’s open. So then, you can see to what degree, you can trust the use of the model. So, we are seeing that shift. Is it going fast enough? I don’t think so. One of the things that I really wonder, and I would love to know, like, why institutions like Open AI, or Google, or Meta are making these foundational models that understand text, and images, and video, and all of these modalities, and sounds, but they’re not touching geospatial data?

Yes, there are some little things here and there that they can do with your geospatial, but they don’t. The day they do is not an if, it’s a when. The day they do is going to help move so much faster the adoption of AI product.

[0:32:52] HC: Finally, where do you see the impact of Clay in three to five years?

[0:32:57] BS: I think in three to five years, I hope Clay doesn’t exist because there’s no reason for this to exist. I hope that the idea of a foundational model that is open is something that is so obvious that there’s no need for a nonprofit to do that, because everyone is going to do that thing. There’s going to be maybe it’s Clay, maybe it’s another, but it’s a really good question, because I can think of other cases. Like I know, Wikipedia or Open Sigma. In a way, I long for a future that Wikipedia is not so needed because public, easy access to information is ubiquitous. It did not happen, and Wikipedia has a role to play so many years after its foundation.

So, maybe that’s the world we live in three to five years. What is for sure is that in three to five years, there will still be a Clay because open source, open data itself sustains itself. We’ll continue to make it better. But whatever happens, we will be able to understand Earth at least as good as we can understand today with AI. That is something that was pretty impossible three years ago. The idea that you could create this summary is these embeddings to understand concepts at the global scale was impossible three years ago. What is possible three years from now? I don’t know, but I know it’s going to be extremely surprising, and I hope at the same time, normal to them, because it becomes the standard.

[0:34:24] HC: This has been great, Bruno. I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[0:34:32] BS: I’m easy to find. My name is very rare. So, if you just search for Bruno Sanchez or Bruno Sanchez-Andrade, you’ll find me in the usual channels. But of course, Clay has a website which is madewithclay.org, and we are also on – with that identity, Clay, we are also on usual social media sites.

[0:34:55] HC: Perfect. I’ll link to that in the show notes. All right, everyone, thanks for listening. I’m Heather Couture, and I hope you join me again next time for Impact AI.

[OUTRO]

[0:35:07] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe, and share with a friend. If you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]