In this episode, I'm joined by Ron Alfa, Co-Founder and CEO of Noetik, to discuss the groundbreaking role of foundation models in advancing cancer immunotherapy. Together, we explore why these models are essential to his work, what it takes to build a model that understands biology, and how Noetik is creating and sourcing their datasets. Ron also shares insights on scaling and training these models, the challenges his team has faced, and how effective analysis helps determine a model’s quality. To learn more about Noetik’s innovative achievements, Ron’s advice for leaders in AI-powered startups, and much more, be sure to tune in!


Key Points:
  • Ron shares his background and how his journey led to Noetik.
  • Why a foundation model is important in their work.
  • What goes into building a foundation model that understands biology.
  • Building the dataset: where does the data come from?
  • The types of data they generate from the samples they use in their models.
  • He further explains the components necessary to build a foundation model.
  • The scale and what it takes to train these models.
  • Ron sheds light on the challenges they’ve encountered in building their foundation model.
  • How to determine if your foundation model is good.
  • Utilizing analysis to help identify ways to improve your model.
  • The current purpose for their foundation model and how they plan to use it in the future.
  • Key insights gained from developing foundation models and how these can be adapted to other types of data.
  • His advice to other leaders of AI-powered startups.
  • Ron digs deeper into their goal to impact patient care by developing new therapeutics.

Quotes:

“Our thesis for Noetik is that one of the biggest problems we can impact if we want to make and bring new drugs to patients is predicting clinical success; so-called translation — that's where we focus Noetik, how can we train foundation models of biology so that we can better translate therapeutics from early discovery and preclinical models to patients.” — Ron Alfa

“We think the most important thing for any application of machine learning is the data.” — Ron Alfa

“The goal here is to train models that can do what humans cannot do, that can understand biology that we haven't discovered yet.” — Ron Alfa

“The big aim of Noetik is to develop these [foundational] models for therapeutics discovery.” — Ron Alfa


Links:

Ron Alfa on LinkedIn
Ron Alfa on X
Noetik
Noetik Octo Virtual Cell (OTCO)


Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.


Transcript:

[INTRODUCTION]

[0:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine-learning-powered company. This episode is part of a mini series about foundation models. Really, I should say, the main specific foundation models. Following the trends of language processing, the main specific foundation models are enabling new possibilities for variety of applications with different types of data, not just text or images.

In this series, I hope to shed light on this paradigm shift, including why it’s important, what the challenges are, how it impact your business, and where this trend is heading. Enjoy.

[EPISODE]

[0:00:48] HC: Today, I’m joined by guest, Ron Alfa, co-founder and CEO of Noetik. Today, we’re going to talk about foundation models for cancer immunotherapy. Ron, welcome to the show.

[0:00:59] RA: Thank you, Heather. Very nice to finally join your podcast. I’ve been following it for a long time.

[0:01:03] HC: Ron, could you share a bit about your background and how that led you to create Noetik?

[0:01:06] RA: Yes. So, I’m a physician scientist by training. Have been largely inspired to pursue cancer therapeutics really since medical school, and that’s been a very strong, passionate mind. Following training, ended up joining a company called Recursion Pharma, which hopefully most folks are aware of. Recursion is a tremendous company, arguably one of the first players in the technology-enabled drug discovery space, working on TechBio. Spent six years at Recursion, really from seed stage, all the way post-IPO.

During those years, we spent a lot of time thinking about how to build machine learning and able drug discovery company from first principles. So, on the ground, Recursion learned a lot of sort of the challenges that we were going to face as we started to build Noetik. But also, got to spend a lot of time thinking about how we integrate people across the organization, how we think about leveraging technology to do drug discovery. Spent really a good amount of time focused on these hard problems that we were really innovating in those years. Then, from there, a couple years ago, decided to found Noetik, focused on what I felt was really an untapped problem in drug discovery.

Our thesis for the company is that, one of the biggest problems we can impact if we want to make and bring new drugs to patients is predicting clinical success, so-called translation. So really, that’s where we focus Noetik, is how can we train foundation models of biology so that we can better translate therapeutics from early discovery and preclinical models to patients.

[0:02:45] HC: Why is a foundation model an important piece of this?

[0:02:48] RA: Yes, great question. So, maybe to put the word foundation model aside. The question we are trying to answer is, can we train models that understand biology? We’ve all been overwhelmed by the progress with language models. Certainly, I’m sure we’re all using them in everyday tasks, and you can use them for a broad array of tasks that involve language. So, the question you might ask oneself is, what is a model that understands biology? What does that model need to learn from?

Really, that’s the starting point for Noetik. We are trying to train models that understand human biology, beginning with human cancer biology, and deploy those models on drug discovery tasks. Part of our view is that, we’re not going to get there necessarily by simply training language models on the corpus of research in the literature, because there just aren’t enough connections between prior published data. There isn’t the right sets of data there to really begin to tackle some of the really hard novel problems.

What we founded the company with is sort of this idea that, if we can build the right dataset to train models that understand biology, then we can begin to see some of the gains that we are seeing with language models in the regime of drug discovery biology and all tangential areas.

[0:04:16] HC: What goes into to building this foundation model?

[0:04:19] RA: Well, we think the most important thing for any application of machine learning is the data, sort of referencing that in the last question. As we founded the company, we spent a lot of time thinking about what is the data set that we would need to build these models. Largely six months before we had done anything, Jacob and myself were iterating on thinking about what types of data can we use, where can we get the data.

Hypothetically, if you wanted to train a foundation model that understood biology, futuristically. what types of data would that model need to see? So, we tend to think, you need to begin to include things like data at the tissue level, you need to have some sort of protein cellular architecture, you need to have genomic information, perhaps transcriptomic information, really sort of the central dogma up to tissue biology. So, we endeavored to, from first principle, design a dataset that would allow us to train models on everything from tissue biology, all the way down to the genome, and everything in between. That was the starting point for the company. That’s an easy idea to come up with, but then, you have to come up with, how do I actually build that dataset?

From there, we spent a lot of time thinking about, well, how do we design such a dataset in a way that’s fit for purpose, for machine learning. What that means is, thinking critically about all the challenges you’re going to begin to face as you start trying to train models on this data. So, Heather, I know you’re heavily involved in the space of digital pathology, and over the years, we know that there are very, very obvious failure modes there when you’re using large public datasets. And that models get really used to memorizing, for example, where a sample came from or what type of slide scanner was used to process a sample.

Anticipating some of those challenges, we try to install certain strategic tricks into generating the dataset that allow us to make sure that models are seeing patient samples that have multiple representations, that are processed in different experimental batches, so on, and so forth.

[0:06:36] HC: How do you go about building the dataset for this? Where do you get the data from?

[0:06:40] RA: Yes, great question. We did consider whether we could use some of the public data out there, but increasingly, as we looked at certain datasets, some of the larger ones, it becomes very obvious very quickly that the data quality is just not there in terms of a lot o the data out there. Largely, this was our experience at Recursion as well. At Recursion, we spent many years and Recursion is still doing an extraordinary job there of building their own datasets from scratch. We spent many years interacting with our datasets and looking at datasets that are in the public regime. When you actually have the ability to sort of look across those two different datasets, you start to realize, well, what people think looks like high-quality data in the public domain is oftentimes not that high quality. We started to see the same hands as we were looking at some of these datasets that we were interested in.

As we founded the company, we decided pretty quickly, within months, that the best solution for us was going to be least as a starting point to build the dataset from scratch. Within months, we had brought in a set of folks with decades of experience in histopathology. We set up a histopathology lab in South San Francisco and started sourcing equipment, and patient samples, designing a dataset that we were going to build from scratch.

We’re really, really happy that we made that decision because at the end of the day, now, one, we have complete control over the data. The quality of the data that we’re generating really is impeccable versus some other data that we’ve been able to access through collaborations, and gives us the ability to really think towards the future. For the next 10 years, we can generate additional data around our samples. We are banking – today, we have over a thousand non-small cell lung cancer samples, primary patient samples, FFPE. We’ve processed a tiny fraction of each of those samples to generate an enormous dataset. As technology evolves, as new biological assays come out, we can continue to evolve the dataset over many years.

[0:08:47] HC: What types of data do you generate from these samples to use in your models?

[0:08:50] RA: Yes. What we decided to begin with was, essentially four data modalities. One, we felt that spatial data was going to be an accelerator for training these types of models. Again, over the many years, we’ve learned that there is a tremendous amount of information in images, and spatial context. Models can learn much more from seeing the biological data where you have essentially cells in their cellular context. So, that was sort of the starting point.

Then, the other approach that we wanted to emphasize was, we really need data at scale. So, we tried to identify a series of platforms that would allow us, one, to generate new spatial data, and then, generate data scale at a cost that’s not cost prohibitive. Ultimately, our current stack today begins with H&E. For each sample, we have an H&E, and then, on the same slide, we have multiplex immunofluorescence. So, we’re looking at roughly 16 proteins. So, the H&E gives us large-scale tissue architecture. The immunofluorescence let’s us identify different cell types, so tumor cells, immune cells, sub-types of immune cells, and cells in the tumor microenvironment.

Then, on a serial section, we get spatial transcriptomics, and that serial section is four microns away. Then, finally, for each sample, we have whole exome-seq, and that allows us to understand the genotypes of the tumors. Between all four of those modalities, we can train these multimodal models, essentially from the level of the tissue to the level of the genome on what we hope is enough biology for them to truly begin to understand biological relationships and help us to discover novel biology.

[0:10:40] HC: So, data is the first piece you need to build a foundation model. What else does it take in order to build something like this?

[0:10:47] RA: Yes. So, data as the starting point. Then, the next question is, how do we actually train – how do we design and train these models? How do we develop models that can incorporate all of these different data streams? What we’re developing in Noetik in terms of these foundation models hasn’t really been done before. These aren’t models that are out there in the public domain, where we can sort of lean heavily on prior work. The data set I would argue is quite unique. I’m not sure. I’m not aware of any other company that has this scale of data, thousands of patients, thousands of patient representations, H&E, protein, spatial transcriptomics, whole exome-seq, where folks are training models on all of those different modalities at once.

We need to think critically about what training tasks we can use and how we can get the models to begin to really look at all of those different data modalities. So, the basic premise for how we’re training models is masked image modeling. There are a lot of nuances to how you present the data, how you handle the masking policies, and really, how you think about layering all the different modalities. But then, even beyond the modalities, these are – it’s important to consider, these are patient samples. So, it’s not just about the data layers that were training on, but these are patient samples that have certain annotations. So, for example, these are lung cancers. They have annotations as adenocarcinomas, small cell carcinomas, squamous cell, et cetera. Those annotations are important.

We are also trying to learn patient biology. So, how do we provide the model’s information that not only are these images, not only are these samples, biological samples, but contained within these data are patients and they represent patients. There are also other metadata that are interesting such as staining batches. We are not processing a single patient sample on a single slide. We’re often processing multiple slides per patient. Those patient samples are in different staining batches. It’s useful to provide that information to the model.

Where I’m going is, when you start to get into all these different data layers, you end up with a complexity problem of, “Okay. What layers am I going to provide the models? What’s the primary layer that’s going to drive the training task? How am I going to force the model to begin to use the other data layers? How can I present certain types of information in a useful way so that the outcome is the model has not only learned biology, but understands that these are patients, and there are some sort of organization of the learning into that regime?”

[0:13:26] HC: So, we’ve got data, we’ve got an algorithm component where you need to figure out how the model interacts with the different types of data they’re going in and what type of tasks it learns. I guess, the other component is resources, compute resources, and the quantity of data you need. What kind of scale are we talking about here? What does it take to train this?

[0:13:45] RA: Yes, the compute is important and is, I would say, is a challenging problem in biological sciences today. Because, I would say, very few companies in biology are training transformer-based models at this scale. Certainly, we’re not at the scale of some of the large language models that we’re seeing in the tech industry, but we are increasingly moving towards a scale of hundreds of GPUs. We’re just getting started here. Our last model was 1.5 billion parameters. We trained it on 256 H100s over the course of weeks.

When you start to get at that scale, it’s quite expensive for the biological scientist to begin to train these types of models. What’s interesting about biology is, historically, in starting a company, the big cost in terms of building biology company is usually the experimental costs, So, setting up the wet lab, generating data, running experiments. Historically, even on the machine learning side, we haven’t really required a huge amount of compute to do some of the work that we were doing, maybe six, seven years ago. But now we are starting out with transformers-based models, we’re starting to move into a regime where we are seeing benefits. One, we’re able to generate a lot more data today, but we are seeing benefits for larger models and beginning to scale those models. So now, that’s adding compute into the consideration for startup budgets.

On one hand, it’s incredibly exciting because I think, having spent a lot of time in the space, we have been massively surprised how fast we’ve been able to make progress with advances. But on the other hand, we need to think a little bit more deeply about resourcing a lot of the machine learning work.

[0:15:37] HC: What are some of the challenges you’ve encountered in building a foundation model?

[0:15:40] RA: Yes. I think, one of the bigger challenges I would say is, maybe this is a little bit unsatisfying. But when you’re working with a so-called foundation model. So, if you have a model that understands biology, really the sort of goal of these models is not to be able to solve a single task. We’re not in a regime where we have H&E as input, and we have pathologist annotations, and we’re trying to solve a classification task, for example.

We are really sort of in a regime where these models should be able to undertake a broad view of biological capabilities. Importantly, what’s crucial to that is that they can not only do things that humans can do. Really, we’re not thinking about, well, how do we accelerate some aspect of a human task that humans can do well? Really, the goal here is to train models that can do what humans cannot do, that can understand biology that we haven’t discovered yet. If we want to develop and discover new drug, if we want to discover new biomarkers, those facts are not out there in the world. We haven’t already identified those targets and those patient populations. We need to start to move into the regime of the unknown.

There are a couple of challenges there. One is, how do you know that once you’re in the regime of novel biology, how do you know that the models are making inferences that are valid, that are true, and represent the real world? The other challenge is, how do you begin to use these models? Because you’re not starting with a very well-defined task. That is for, again, the digital pathology example where we have a bunch of samples and we want to have the model label them based on their tissue histology. There are many things we can begin to do with these models.

I would say, one of the underappreciated challenges is, okay, once we’ve trained these models, how do we use them? How do we begin to understand what they’ve learned? And then, as humans, how do we begin to use them for the ultimate goal of the company, which is discovering new therapeutic targets, discovering new patient populations, so on and so forth?

[0:17:47] HC: Well, you’ve set up my next few questions pretty well here. So, the first one is just, how do you know whether your foundation model is good? How do you go about figuring that out?

[0:17:54] RA: Yes, this is a great question. I would say, again, when you’re in this regime of training models that can ultimately hope understand new and provide you insights into new biology. We don’t have sort of fixed benchmarks here. So, in training the model, using mass image modeling, we can assess how well does the model reconstruct or generate images. But that doesn’t necessarily tell you whether the model’s understood biology. In fact, surprisingly, we found in some cases, the model learns more of the biology that we’re interested in if it’s not generating the highest resolution image per se, because we are not just interested in pixel-level understanding of cells. We are interested in understanding biology at the tissue level, at the level of cell-to-cell interactions, and at the level of cell biology as well.

That often, the quality of image reconstruction in terms of on the pixel level doesn’t necessarily tell you that the model has learned the biology that you want. So, I tend to think of this as a biologist. We can try to come up with benchmarks, but for me – and then, this has been true for many types of applications, even at our prior company. What I want to see is that, there are certain biological facts that we know to be true that have been validated and established over many years. If we can start to assess that certain things that we expect to be true are being learned and in a robust way, the more of those examples that we can begin to accumulate, the more confidence you start to get in the things that perhaps the model is inferring that are unknows.

That’s a little bit unsatisfying because it’s not so quantitative. But as a biologist, that is the first thing I’m very interested in is, are we seeing things that we expect to be true coming out of these models? As you start to go down that path and develop an intuition for how things are working, what things are true and what things are perhaps not true. One can start to think about, “Okay, how do we begin to build benchmarks in this novel regime?” There isn’t necessarily going to be one approach. There might be multiple approaches and they might be very different. So, for example, one of the things we’ve been looking at is, can the models make inferences that are cross-modal? So, can the models infer protein, infer T-cell markers from the H&E? It turns out that these multimodal models can do that, and then you can begin to quantify how well do they do that.

If you can now begin to assess how well the models are making these cross-modal inferences, then as you sort of go into a regime where you’re starting to ask biological questions that depend on multimodal inferences, that gives you some confidence that those questions are valid. Another approach we take is, we started in non-small cell lung cancers indication. One of the reasons was, there’s a lot of known biology there. We know which patients. There have been thousands of trials run in that indication. We know certain groups of patients that respond to immune checkpoint inhibitors. We know about patients that don’t respond to immune checkpoint inhibitors. We know about different immunotypes, for example.

Are the models able to identify known biology that we can lean on, that gives us confidence that they’re learning from the underlying data in a way that is consistent with our goal of using the data to discover therapeutics. Does that make sense?

[0:21:29] HC: Yes. Do those types of analysis then help you identify ways to improve your model, or do you have to dig deeper to look for areas for enhancement?

[0:21:37] RA: Yes, absolutely. I mean, that’s just the beginning. We’re always thinking about, one, what has the model learned? So, often times, when you’re in this regime, our hope is for these models to learn, again, biology that we don’t know yet, that we haven’t characterized. If we train a model and all it does is tell us about certain, let’s say, we’re training a model among cancer patients, if the model only has learned about certain genomic subtypes, then we really you’ve learned anything. We can classify genomic subtypes simply by genotyping tumors.

As a company, we believe that tumor biology is much more complex than that as well. You have certainly genomic subtypes that are therapeutic or relevant, but we also have immune subtypes that perhaps are orthogonal to those genotypes, and we have more complexities even beyond single genes. So really, as we begin to see the models learning that certain patients that have similar biology look similar to each other. Inevitably, we’re also seeing that there are new dimensions that are being learned by the models that we just don’t understand yet.

So, in this regime where we don’t necessarily have those labels, we need to begin to tackle these problems by better understanding the data itself, and by using parallel approaches where we can maybe look at the tumor immune microenvironment in those samples, and understand, well, okay, we have a set of samples that the model has grouped together for some reason. What is unique to those samples across different aspects of biology?

One of the things we’re also doing internally to support that work is, we are training these self-supervised models, but we are also developing methods, and the data is conducive to this, to providing ourselves the labels to understand these patient samples across many of these regimes. So, the genotype is just one example, so we can look at, has the model learned groupings of patients by genotypes and sub-genotypes as it were. But we have developed labels for tumor immune phenotypes, and we can ask whether the models are grouping patients by immune phenotypes. We look at things like HLA expression, MHC. The more sort of handholds that we can identify and use to query what the model has learned, the quicker we can begin to understand how well these models are learning patient biology.

[0:23:57] HC: What purposes are you currently using your foundation model for and how do you plan to use it in the future?

[0:24:01] RA: The big aim of the company is to develop these models for your therapeutics discovery. So, our thesis is that to develop successful therapeutics with the best chance of success for patients. You need to understand not only the target for which to develop the pharmacology, but we need to understand this very specific patient population. So, really, this is a precision oncology approach.

We think that, generally over the last 10 years, perhaps a lot of the failures for molecules that have been pretty exciting generally at the time, have been driven by not understanding the right connectivity between your target patient population. So, that is our core focus as a company. One, understanding, discovering patient subtypes from the data that can lead us to targets, that can allow us to enroll those patients in clinical trial and develop new therapeutics. At the same time, we’re also seeing that these models are incredibly powerful outside of the regime of drug discovery. For example, I mentioned, we can do these cross-modal inferences between H&E and protein.

There are certainly diagnostics applications, again, because these are foundation models. They’re able to do a wide array of downstream tasks. So, I would say, we’re more in an exploratory mode of understanding how we can begin to leverage the models, perhaps in collaborations with other organizations in other areas of healthcare. But our core focus as a company is using the models to discover new biology, patient biology, and develop therapeutics that we hope can make an impact for cancer patients.

[0:25:41] HC: Are there any lessons you’ve learned in developing foundation models that could be applied more broadly to other data types?

[0:25:47] RA: I think I’ve mentioned some of these. I think people sort of underestimate the importance of the data itself. One of my views in when it comes to machine learning and biology generally is, as much data as we can collect, the scale of data that we have in biology is orders of magnitude, many orders of magnitude lower than the scale of data that we can access, for example, if you’re training an LLM. What that means is, it becomes very hard to sort of brute force train models on as much data as you can collect. What that means in terms of building machine learning companies is, it actually becomes more beneficial to develop fit for purpose data sets that even though, you know, perhaps are not as large in terms of the number of samples sometimes. You really understand the data much better and you’ve sort of design them to train models in a way that will be more effective. That’s sort of a view I have as we currently stand today.

There are certain data sets that are well curated and have led to very successful applications. So PDB and these protein models is a great example, where we actually have a very large well curated data set. Then, that has led to a lot of really great developments in that space. So, certainly, this isn’t always true. There are exceptions. But I would say, largely in biology, we just don’t have that many versions of those datasets. Maybe folks will begin to build those, and curate them. I know there’s a lot of exciting work happening in that space. But as a starting point today, we feel that to make the most progress more quickly on something that’s relatively novel, one needs to build the data set from scratch. Then, once you begin to understand the data, then you can begin to ask, “Well, can I scale these data by pulling in other types of datasets that maybe we can curate a bit better?”

[0:27:43] HC: Thinking beyond foundation models and data, is there any advice you could offer to other leaders of AI-powered startups?

[0:27:49] RA: Well, I think this is an incredibly exciting time to build in this space. I think we are seeing – and I know there are folks in sort of biotech and biopharma that have for many years felt like there’s a lot of hype in the AI space. But I would argue, we are genuinely seeing advances in biology or AI applications that are happening at a speed and scale that are unprecedented.

We’ve been trying to build in the space for over a decade, and some of the work has required us to advance the technology. Where, perhaps, 10 years ago was a little bit early for machine learning and drug discovery. But this is sort of the moment, we can lean on a lot of accelerators. Part of that is the model architectures. Part of that is the ability to generate data at a scale that is unprecedented. The other part of that is just, folks have been trying to do this for 10 years. So, we have a lot of lessons learned and we can apply those lessons very quickly. One, this is a moment to build in the space, and it’s incredibly exciting. I fully support people being ambitious and trying to take on hard problems.

I think one of the challenges still in the healthcare space is, this is a highly regulated industry, both in healthcare and biopharma. It’s not going to be as easy to begin to deploy technology in various applications. So, we are going to have to continue to break down barriers and you figure out the right ways that AI applications are going to make an impact for patients, both again, healthcare and drug discovery broadly. But I don’t think that shouldn’t stop us. I mean, I would argue, these are – human health is one of the most important problems we can tackle. And the potential impact of accelerating drug development, accelerating therapeutics, accelerating the care of humans in the hospital setting is enormous. It’s incredibly important. We shouldn’t stumble and we shouldn’t be roadblocked by the challenges in terms of figuring out how to deploy technology. I think that just makes the problem ever more important.

[0:29:54] HC: Finally, where do you see the impact of Noetik in three to five years?

[0:29:58] RA: As I said, our goal really is to impact patient care by developing new therapeutics. So, the North Star for the company is to very quickly begin to deploy these models to discover new therapeutics, discover patient populations, and really, we aim to sort of redefine how we think of cancer biology more broadly. So, an analogy I like to think of in oncology is, we began by defining tumors histologically by pathology subtypes. Really, the hope there was that, it was going to define treatments.

The first revolution came when we began to define tumors by genotypes. So, all of a sudden, understanding certain oncogenic drivers of tumors allowed us to develop precision therapeutics for those patients. That’s been transformative in many ways. We think there’s a third wave here that’s yet to come, which is both the combination of more complex genomics that we can understand tumor basis, which is both understanding tumor biology based on more complex genetics, combinatorial genetics. But also, importantly, beginning to understand tumor biology based on other contexts such as the immune context.

We think that’s going to bring a third wave of precision oncology. We hope, again, similarly more effective therapeutics. Really, the aim of this technology is to redefine tumor biology in a new way that allows us, and really inspires a whole new shift in how we treat patient biology. If you kind of project this even further down the road, as these models can begin to understand biology, our aim is for them to also more acutely impact patient care.

So, perhaps, as you start providing feedback from the clinic, the models can help a clinician decide, what is the next – if a patient is no longer responding to a therapeutic, what is the next therapeutic we should try in a particular patient population. Really, that’s the power of these multimodal models. So, you can begin with a certain data set and then begin layering additional datasets. Some of which can be collected in higher frequency in the clinical setting beyond just the histopathology.

Really, we are thinking about these foundation models at the beginning as sort of drivers of therapeutic discovery and innovating clinical trial enrollment in the near term. But long term, if successful in that regime, we think they enable a whole new way to think about treating cancer patients and perhaps or indications as well.

[0:32:31] HC: This has been great. Ron, I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[0:32:38] RA: Yes. Thanks, Heather. It was great speaking. So, one, I would tell people to check out our website at noetik.ai. We also have some really great technical reports in the research page that start to describe different aspects of the work, everything from the dataset that we’ve developed over the past two years to some of the exciting models. We recently released a virtual cell model that we call Octo Virtual Cell, and there’s actually a demo there. So, we put some patient samples up, and there’s a demo app that folks can use to engage with the model and engage with the data, and just see some of these applications, and start thinking about how do we use these models to begin to interrogate biology.

[0:33:14] HC: Perfect. All that will be linked up in the show notes. Thanks for joining me today.

[0:33:18] RA: Thanks, Heather.

[0:33:19] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I hope you join me again next time for Impact AI.

[OUTRO]

[0:33:29] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe, and share with a friend. If you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]