Decoding Biology with Aaron Mayer from Enable Medicine

Spatial biology is an important part of the research being done to gain biological insights and joining me today on Impact AI to discuss how his company, Enable Medicine, uses AI to decode biology is Aaron Mayer. You’ll hear about Aaron’s background, what led him to create his company, what Enable Medicine does and why, and how they use machine learning in their endeavors. Aaron shares the struggles they face, why they publish their research, the timing their company has nailed, and so much more! Finally, he shares some words of wisdom for other leaders of AI-powered startups.

Key Points:

Aaron’s background and what led him to create Enable Medicine.
What Enable Medicine does and why it’s important for healthcare.
The role machine learning plays and how it’s used with spatial biology data.
The challenges Aaron has faced working with this data.
How they prepared for the use of AI by creating a data infrastructure from scratch.
The importance of trust and transparency and the benefits of publishing articles.
Why this is the perfect time to build this kind of company.
Aaron shares some advice for other leaders of AI-powered startups.
Where Aaron sees the impact of Enable Medicine in the near future.

Quotes:

“The goal of Enable Medicine is really to organize biological data and make it searchable to deliver insights to the questions that we really care about.” — Aaron Mayer

“Machine learning and AI is deeply integrated into the platform and technology stack that we've been building [at Enable Medicine].” — Aaron Mayer

“We want to take these various AI models and put them into an environment where they can operate with an expert in a loop.” — Aaron Mayer

Links:

Aaron Mayer on LinkedIn
Enable Medicine

Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.

Transcript:

[INTRODUCTION]

[0:00:02] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven machine learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research and computer vision for people and planetary health. You can sign up at pixelscientia.com/newsletter.

[INTERVIEW]

[0:00:32] HC: Today, I’m joined by guest Aaron Mayer, Co-Founder and Chief Scientific Officer at Enable Medicine to talk about spatial biology. Aaron, welcome to the show.

[0:00:42] AM: Hey, Heather. It’s a pleasure to be here today. Thanks for that opportunity to discuss.

[0:00:46] HC: Aaron, could you share a bit about your background and how that led you to create Enable Medicine?

[0:00:50] AM: Yeah. I consider myself a systems biologist, which really means I spend most of my time thinking deeply around how we build a blueprint of biology, how we take measurements of disease and put them together to understand biology at a systems level, how to build a picture of disease or a map if you will. I’m formerly a bioengineer by training. I was quite fortunate to train at a lab at Stanford under the mentorship of Dr. Sam Gambhir. This is really a special environment that combined a lot of different disciplines from chemistry, to physics, to biologists, to clinicians. Everything from thinking about precision health to AI, to immunotherapies. It was in this environment that I really started to think deeply around how do we measure biological systems and build models that allow us to understand them.

Now, this was also an exciting time, because the types of measurements that we could make biology, were rapidly improving. I had the opportunity to collaborate with many faculty, like Gary Nolan, who was thinking about how we could make multiplex measurements of biology in situ and look at many proteins and genes in the cells that they function in, and how they connect together. Largely, this really was the age of the biological atlas, where many papers were being published, talking about how we were taking these biological measurements for the first time and building references of different diseases.

Now, in parallel, there were some pretty exciting breakthroughs in AI that were happening at the same time. Everything from convolutional neural networks and breakthroughs and computer vision in different models that were allowing us to make sense of things in new ways and to make predictions that we couldn’t before. It was around that time that my co-founder, Sunil Bodapati and I, teamed up to say, “Hey if we’re able to take these really powerful, complex measurements of biology and combine them with the new types of predictions and insights that we can get from these AI models that were being developed, is there an opportunity to make an impact in medicine?” It was around that question that we really formed, Enable Medicine, with the goal of trying to bring some of these recent breakthroughs in AI and these different technologies and measurements of biology that we were making to medicine and thinking about the impacts it could have there.

[0:03:02] HC: What all does Enable Medicine do? Why is this important for health care?

[0:03:07] AM: To put it in a nutshell, the goal of Enable Medicine is really to organize biological data and make it searchable to deliver insights to the questions that we really care about. Those are questions like, what is the mechanism of action of a therapy? What is the ideal treatment to give to a patient? If you’ve got a drug that’s been successful in a particular patient population, what other patients could benefit from that therapy? What new, better drugs could we invent to new targets that make an impact in diseases and areas that are currently untreatable? We set out with asking the question around, if you were to build a biological database or an atlas, these maps of diseases, what would you need to do to actually make a meaningful impact on health care and medicine? When you start to think about that data stack, you start to think, “Okay, I need to collect information on the patient. I need to know the clinical metadata. I need to know the demographics and the treatment history and the response outcomes.” But it also need all of these data layers that measure their biology. Things like pathology, where we can look at an H&E stain of a tissue and the morphology and characteristics of the cells.

Things like sequencing, where we can measure the genes and the expression thereof. Things like spatial biology, which I think of as an important middle layer in this data stack, where we can take measurements of proteins, genes, and cells in context. When you start to think about that data stack, you’d ideally like to collect this on all sorts of patients, across all sorts of diseases, under all sorts of perturbations and therapies, at different time points across disease. All of a sudden that becomes a pretty daunting challenge. That’s a lot of data. There’s a lot of cost to collect that. We started to think we’re really going to have to take a collaborative effort to building this data stack and to providing infrastructures and tools and to even leveraging AI and models that allow us to build this data stack at scale. That was a pretty daunting task that we decided to undertake here.

[0:05:08] HC: Then what role does machine learning play? How do you use it with all these different forms of data?

[0:05:13] AM: Machine learning and AI is deeply integrated into the platform and technology stack that we’ve been building here. I’d like to say that it’s everything from the user interface to the applications and models that we’re leveraging. I’ll give some examples there. First off, if you’re building this data stack, you need to be able to organize this data. You need to have the infrastructure to index this data. But then importantly, you need the ability to label this data. I think a good example of a company that took on data labeling in other spaces is Scale AI, where they pretty much brought together tools and individuals to be able to label large data sets that made a big impact on AI and its applications and other industries, like automotive, and finance, and manufacturing. In the biological space, this is quite hard. Labeling biological data has often been in the domain of experts with decades of training. These are your clinicians, your pathologists who can look at that H&E sample that I mentioned earlier and discern the morphologies of cells and label them as tumor versus healthy.

These are biologists, immunologists who understand the genes and proteins that cells express and have toolkits of analysis tools that they use at their disposal to help label these. This has been a really daunting task in the past. We thought, how could we use some of these models within the infrastructure that we’ve built to speed up data labeling at large in the biological domain so that we could have lots of good biological data. That’s one place where it’s played a role. Another, when we go back to thinking about that data stack, is it’s impossible, I think, to go out and collect every single data layer on every single patient, but with some of these models, we might not need to. We think a lot about data imputation. Can we predict different layers of the biological data stack for a patient, even if we haven’t measured that layer? I think the answer to that question is, yes, if you’ve got good training model or a good training data for your models.

Then the last key area where AI and ML really makes an impact in what we’re building the platform we’re working on is actually finding insights in the data. We’ve got work where we’ve built graph neural networks to try to look at these pictures of disease, these maps of disease that we’re generating and try to identify the networks and the nodes and the drivers of some prediction we care about. Let’s say, patient response to a treatment. First off, can we do that? Then second off, can we do that in a way that it’s not a black box model? Can we actually understand what are the underlying biological features that are driving that prediction? I’d say those are three key areas where ML and AI makes an impact on the platform that we’re building.

[0:08:03] HC: The data you’re working with, you’ve already mentioned some forms of clinical data and the spatial biology data. What challenges do you encounter in working with these different forms of data and with the spatial information in particular?

[0:08:14] AM: Definitely. Working with this data has been a challenging endeavor, and that’s why we’ve had to invest so much time into building the infrastructure for it. There’s several areas that this type of data poses challenges. First and foremost, it’s data scale. These data sets, I mean, it brings a whole new meaning to the term scale. A single data set can be a terabyte in size, so just operating on that data can be difficult. Then that’s just the spatial biology data layer alone. When you start to add in these other measurements, that scale only grows. Another piece is coming up with good data abstractions. What I mean are, what are generalizations that we can draw across these many different data classes, these many different measurements, so that we can generalize it to a set of operations that we can take on that data that can be standardized. That’s actually quite difficult, because there’s not standards in this field yet for a lot of these different spatial biology data classes that we’re working on. But I know that this is an active endeavor in the field to come up with better data abstractions.

Then when you get to the data itself, you think a lot about data integration, so variance between data sets, your measurements of a protein in one batch could vary from your measurements of a protein in another batch. How do you correct for those so that you can actually search at large and make comparisons at large across this whole huge atlas that you put together? Then also thinking about data quality is really important to any group that’s trying to build a large database. So, having models that can scan data or flag data for quality, keep what’s good and valuable, and flag that data, which might raise questions or need to be excluded from a downstream analysis. All of those have posed challenges, especially in that spatial biology data layer where the data is so large and so complex.

[0:10:12] HC: Then on applying machine learning to it, even before you could apply machine learning, what processes or infrastructure did you have to have in place to deal with these different forms of data and get it ready for machine learning?

[0:10:24] AM: Yeah. We really had to set up entire processing pipelines and toolboxes to operate on this data. For the spatial biology data class in particular, when we formed Enable Medicine, the pipelines even just to process this data from the raw images that came off of the different machines that were making these measurements were very immature. We had to spend a lot of time building these pipelines, which could stitch together these images into their final form and output. The second piece was really having a toolbox to be able to operate on these images. That’s things, everything from being able to perform that quality control and to be able to exclude images or patches from them that failed. The tools that allowed you to manually annotate cells or manually add annotations. Then once you started to have that data, the ability to process it, to index it, to organize it, to manually label it, to manually analyze it, then we were really able to bring in some of these AI models and start to unlock their power.

I think that’s been core to our development philosophy, is that we want to take these various AI models and put them into an environment where they can operate with an expert in a loop. That means that you can have that expert immunologist. You can have that expert pathologist who’s got a toolkit to label data. That data then serves as training for these models, which makes new predictions. =Let’s use the example of cell types. You could have a pathologist label cell types in an image. You can have a model that then predicts those cell types on other images. Then you could still have that pathologist review those predictions. We think about setting up these continuous learning loops, where we’ve got experts interacting with the AI in an ecosystem that really leverages the expertise and power of both.

[0:12:20] HC: For some of the pieces that you’ve created to do the processing, whether it’s quality control or annotation on some of the other things you mentioned. Did you have to create those pieces of infrastructure or were there existing tools that could handle some of it for you?

[0:12:36] AM: Yeah. A lot of this infrastructure, we’ve had to create from scratch. We also work, I think, very closely with like the scientific community at large to try to leverage breakthroughs and advancements in that field and adopt them into our platform. A large amount of the code and the pipelines that our team builds, we think about making this open source. We think about open science. We think about leveraging the advancements that the community is making. To that end, we’ve really been building out an application toolkit like I alluded to earlier. That toolkit includes things that are both brand new models that the Enable Medicine team has developed internally, to also models, so that folks in academia are building and we’re deploying on our infrastructure and our team is working with them to bring them up to production-grade and allow many other people to use them, to also partnerships with other commercial entities that are developing applications that can run on this infrastructure. I’d say, we’ve had to build it, but we’ve also had a lot of help from the community at large and from partners to bring together these tools into a single ecosystem that we can really start to find insights in this data.

[0:13:45] HC: You’ve open-sourced some of the tools you’ve created. Your team has also published a number of scientific articles. What benefits have you seen from the open-source process and from publishing that have benefited your team?

[0:13:58] AM: Yeah. I think as a scientist and for the scientists at large, like trust and transparency are core to scientific beliefs. There’s a natural cynicism, a desire to understand the process, the how, to be able to validate and check work and to essentially have trust in the results. I think publishing is pretty core to that process of scientific transparency and building that trust. I think more importantly, we see publishing as a means to build community and to contribute to community and science at large, which is really one of our primary goals. For us, publishing our research is really core to the fabric of who we are. We want to participate in the scientific discussion. We want to invite others to work in these spaces because we’re not sitting here saying that we’ve got this all solved, that we know the solutions to all of these daunting challenges that we face, but we’re thinking much more, how do we approach this collaboratively? How do we bring scientists together? How do we solve some of these really hard problems when it comes to data and the insights that can be found from it?

The other piece too, that we’re really trying to make an impact on and start to like lead by example as an industry-academia partner is when it comes to data accessibility and data sharing. I think one of the areas that really motivates a lot of what we do is the fact that today, data is still not as accessible as we would like as scientists. Now, the NIH has issued new initiatives that require data sharing to be done with any research publication that goes out. There’s been great consortium and efforts by scientists and different groups to try to make data more accessible, but this is still a somewhat broken system and despite some of the databases that have been set up, this data is hard to access. It goes back to those challenges we were discussing earlier around the infrastructure and how do you organize this data. A lot of papers still end with the contact author for this data if you’d like to further build on it. I think of this as a major opportunity. I think biological data is something that we spend millions or billions of dollars on even as a society is. The government invest in it as institutions invest in it and it’s highly under-leveraged. Thinking about how we can share data, how we can help facilitate that process, how we can make it more accessible, that again is really core to what Enable Medicine is building and the platform that we’re developing.

[0:16:33] HC: Why is now the right time to build this AI technology for searching biology? Are there aspects of this problem that weren’t feasible until recently?

[0:16:42] AM: Absolutely. I’ve been saying this for several years now going back to the origin story where Sunil and I were like, “Now is the time to build this company.” Now, I would say, “Now is really the time.” I think you see a lot of activity and excitement in the space. I really do think that the great company that will have successfully organized biological data and made it insightful is being built right now. We hope to be that company or certainly hope to be a part of the group of people that help make that a reality. Why that’s becoming possible is twofold. The first is the measurements that we can now make a biology as a systems biologist. They’re now rich enough, contextualized enough, high-plex enough, cheap enough that we can actually start to put together that biological map, that biological picture in a resolution that we were never able to before. There’s just many low-hanging insights waiting to be seen if you can tap into that data and that data just simply didn’t exist five or ten years ago.

Then is that natural convergence with the tools that are necessary to actually derive those insights from it. These are some of these AI models that we’ve been talking about. I think a really good example, recently around the power of having organized biological data plus some of these AI tools that are emerging is all of the excitement that we’ve seen around large language models. Things like ChatGPT or GPT4 or some of the other open-source models. You start to see the power that bringing some of these generative models to this data can have and we’re thinking a lot – I like to call it generative biological search, but how do you bring these models, impair it with this data to start to extract those insights from it? These models can do everything from help you operate on the data. Doing things like going back to our labeling cells example before to use some of those tools, again, you have to train and be an expert to understand, for example, how to run clustering and then label cell clusters with their phenotypes.

Now you’re entering a world where we can use AI tools, akin to ChatGPT, to simply talk to your data and say, “Hey, I’m interested in labeling cells in my data set. What are the different models that are capable of being used in this environment?” Maybe several different applications will come up. When you say, “Okay, unsupervised clustering was recommended as the best model to use for this particular data type.” Go ahead and run unsupervised clustering with the best-recommended default parameters. Then that model can run. You start with some of these tools, have the ability to really talk to your data, ask how is this particular sample different than all of the other samples in my Atlas? How do I contextualize my insights? How do I search across this very large set of knowledge that I’m compiling? That just wasn’t possible before. I really think it’s an exciting time as these types of fields are converging.

[0:19:44] HC: Yeah. The progress in deep learning and AI overall has definitely been moving fast. I’ve seen it over the last 10 years, but it’s hitting the headlines more recently. That’s an indicator that’s in the technical space. It’s moving even faster and developing these new capabilities for us.

[0:20:02] AM: Absolutely. When we think that the infrastructure and platform that we’ve built here in Enable Medicine is really ideal for bringing some of these breakthroughs and other domains to this biological data, to this data that we’re applying towards medicine. Part of the reason there is because of the guard rails that exist in this environment. We can set up guard rails on these models. The other piece too, is that when you plug in something like GPT4 into the environment that we’ve built, you give to it the ability – these computational tools, like we’ve got compute in our environment, so it can actually run the model. We’ve got data in our environment. You can actually say, “Find all patients that have head and neck cancer that looks like this. Then fetch 100 samples of that type and then have a model run on it.” We’ve really set an environment that’s ideal to be able to take advantage of some of these breakthroughs.

[0:20:55] HC: Is there any advice you could offer to other leaders of AI-powered startups?

[0:21:00] AM: I think the advice that I offer is really towards AI-powered startups trying to work in the biological and drug development spaces. That’s really to build a team that’s at the intersection of biology, and AI, and engineering. A big theme for us has been to build at these intersections. I think it’s necessary that you really have folks who have expertise around the questions and how to answer them. Things like, how do you identify patients that are likely to respond to a particular drug? How do you identify a biomarker disease? Do you understand mechanism of action? Then having similarly on the other side, folks who have deep understanding of the technical aspects of the code and the infrastructure and the models that need to be deployed against these applications and these key questions. So, building teams that operate at an intersection, I think is pretty critical. I’ve seen that play out here at Enable Medicine to success in this endeavor.

Now it’s also extremely challenging, because these different groups of individuals often speak different languages in the sense that to an engineer who’s, say, working at Airbnb or Google, you might intimately understand the end use case, because you’re likely booking rental homes or running Google searches yourself, right? Biology might be this foreign subject. So, understanding how to answer those questions that I went through is definitely going to require conversation and communication with folks who deeply understand that piece. Then on the other end, biologists might not understand all of the challenges and limitations of some of the models and infrastructure that’s being built. So, communication is really key and that can be difficult. I’ve seen by investing in that, I think it’s really had an impact on us being able to build applications that can answer these really meaningful questions that we’re trying to answer.

[0:23:00] HC: Yeah. I definitely agree with that. I would echo that it’s not just important for biology and medical domains. It’s important for many other application-specific areas as well. It’s something where you’re working with agronomists to understand how farming works. You need that domain knowledge. You need the technical people working with those who understand how the data was collected and what the data represents. You need that continuous collaboration to really succeed with these interdisciplinary problems.

[0:23:29] AM: Absolutely.

[0:23:30] HC: Finally, Aaron, where do you see the impact of Enable Medicine in three to five years?

[0:23:35] AM: Yeah. I see the impact as twofold. One for the scientific research community. I hope that we’ve really made a dent towards achieving our goals of making data, especially biological data, more organized, more operable, more manageable, more searchable, more insightful. I hope this manifests itself by having really an ecosystem where data is being shared, data is being published, and we’re actually speeding up the rate of insights that are being found across the academic community at large for all of the different disease areas that folks work in. Then I think for our pharma partners, the goals that this translates into being able to more effectively answer the questions that they really care about, which is what are the targets that we should build new drugs to? What are the mechanisms of action? Can we predict off-target toxicities before they happen? Can we make sure that we’re getting the right drugs to the right patients? If we do this well, I think within three to five years, we’ll be able to make a big impact on what we ultimately care about, which is impacting the patient and improving the quality of life and really improving the way that we treat and manage and understand disease.

[0:24:52] HC: This has been great. Aaron, your team at Enable Medicine is doing some fascinating work for spatial biology. I expect that the insights you shared will be valuable to other AI companies. Where can people find out more about you online?

[0:25:04] AM: Thanks so much. If you’d like to learn more, you can visit our website, enablemedicine.com, and that’s a great place to get started. Our platform to researchers is also open, and we’ve got a free academic tier. If you want to log in and start to explore some of the data that we’ve begun to compile, or if you want to join in on the efforts that we’ve undertaken here, we’d love to collaborate.

[0:25:25] HC: Perfect. Thanks for joining me today.

[0:25:27] AM: Thank you.

[0:25:28] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I hope you join me again next time for Impact AI.

[END OF INTERVIEW]

[0:25:38] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend. If you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter, at pixelscientia.com/newsletter.

[END]