Can AI cure autoimmune diseases? This episode of Impact AI dives into the groundbreaking work of DeepCure, where artificial intelligence meets medicinal chemistry to tackle some of healthcare's most stubborn challenges. Co-founder and CEO Kfir Schreiber shares how his team uses advanced machine learning tools, physics simulations, and human expertise to design the next generation of small molecule drugs. From overcoming data limitations to fostering tight collaboration between machine learning scientists and chemists, this discussion illuminates the potential of AI-driven innovation in transforming patient outcomes. With a rheumatoid arthritis drug nearing clinical trials, DeepCure is poised to redefine the future of medicine. Tune in to discover how AI can accelerate drug discovery, overcome data challenges, and create life-changing therapies, as well as how these insights can inspire your own innovative pursuits!


Key Points:
  • How Kfir's background in computer science and applied math led him to found DeepCure.
  • Insight into DeepCure’s mission to leverage proprietary technology to create small molecule drugs for inflammation and autoimmunity.
  • Augmenting human expertise with AI: the role of machine learning in drug discovery.
  • Layers of using AI to analyze targets and design small molecules with optimized properties.
  • Challenges in small molecule datasets and how DeepCure develops tailored models.
  • The influence of molecule representations like SMILES on machine learning models.
  • Combining publicly available datasets with data generated in DeepCure’s automation lab.
  • Model validation techniques to address out-of-distribution challenges in small molecule data.
  • Collaboration between machine learning experts and chemists to refine drug discovery.
  • Recruiting top talent by highlighting DeepCure’s impactful mission in healthcare.
  • The process of onboarding machine learning developers with no prior chemistry knowledge.
  • Problem-solving advice for leaders of AI-powered startups: it’s not about the AI!
  • DeepCure’s future plans for clinical trials and expansion into other autoimmune diseases.

Quotes:

“Machine learning in our space is almost never a complete solution. It's a way to augment our chemists [and] our biologists [to] try to make them capable of solving problems that were unsolved before.” — Kfir Schreiber

“One of the best things about DeepCure [is the] very tight collaboration between the domain experts and our machine learning scientists.” — Kfir Schreiber

“Your average machine-learning scientist doesn't have chemistry intuition. We need this feedback and we need to integrate this feedback back into our models to make the predictions make sense.” — Kfir Schreiber

“Focus on the problem, focus on the value, and work your way backwards to the best tools to use.” — Kfir Schreiber


Links:

DeepCure
Kfir Schreiber on LinkedIn


Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.


Transcript:

[INTRODUCTION]

[0:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine-learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people in planetary health. You can sign up at pixelscientia.com/newsletter.

[EPISODE]

[0:00:34] HC: Today, I’m joined by guest, Kfir Schreiber, co-founder and CEO of DeepCure, to talk about developing therapies for immune diseases. Kfir, welcome to the show.

[0:00:43] KS: Thank you, Heather. Thank you for having me.

[0:00:45] HC: Kfir, could you share a bit about your background and how that led you to create DeepCure?

[0:00:49] KS: Of course, happy to. So, I come from a computer science applied math background, did my undergrad in Israel, then moved to Boston to join the MIT Media Lab. Where I had the pleasure to meet Professor George Jacobson and Thras Karydis, both of them were at the Media Lab at the time. Together, we’ve been doing research at the intersection of machine learning and life science broadly. We worked on a variety of problems, including protein folding, protein-protein interaction prediction, as well as small molecule drug discovery.

After a few years at the lab, when we got some exciting results, we had the pleasure of showing those results to some of our sponsors from big pharma companies, and they got very excited about the potential of this technology, which we then decided to spin out as a startup company. Thras and I left the PhDs on hold, and went on to start DeepCure.

[0:01:41] HC: What does DeepCure do and why is it important for healthcare?

[0:01:44] KS: So, DeepCure is a small molecule drug discovery company. We’re developing new therapeutics in the inflammation and auto-immunity space. We worked on a few different therapeutics or therapeutic programs, targeting different indications. To do that, we use proprietary technology that we have built internally and that combines machine learning, physics-based simulations, and a chemistry automation.

The whole idea of DeepCure is to take some of the therapeutic targets, proteins that people in the industry have been interested in for a very long while, but did not have the right tools to solve, and use our technology to make those into new drugs, reaching patients eventually.

[0:02:27] HC: What role does machine learning play in this technology?

[0:02:30] KS: A very big one. So, for us, really, it’s about combining three different components and combining them very tightly. Machine learning is the first one in almost everything we do, the way we understand small molecules, the way we design small molecules, the way we understand their interactions with proteins, all using machine learning tools. But it’s not only machine learning, it’s machine learning combined with physics-based simulations, like molecular dynamics, and quantum mechanical calculations. And of course, strongly influenced and strongly combined with our human experts.

I think one of the things that we have learned is that, machine learning in our space is almost never a complete solution. It’s a way to augment our chemists, our biologists, try to make them capable of solving problems that were unsolved before, but it’s almost never a black box AI designing new drugs. It’s really this combination between the physics simulations with the machine learning and the human experts.

[0:03:33] HC: What kinds of things do you try to predict with machine learning?

[0:03:37] KS: Yes. So, we use machine learning in few different layers. The first, kind of going sequentially in the way we develop a new drug. The first step is in the way we understand the biological target, the protein. We use machine learning approaches to basically scale up our physic-based simulations. Physics-based simulations are really useful, but they are very computationally heavy. So, what we do is we use a combination of ML and physics-based simulations to really gain the throughput that we need in our space to augment those tools, and eventually, better predict where on the protein surface a small molecule might bind. That’s the first step of understanding the target.

The second step is in designing the small molecule itself. We have developed a generative AI tool called MolGen that is building libraries, virtual libraries of small molecules trying to design what would be the perfect drug. It’s almost never the perfect drug to begin with, but that’s the aspiration. It means that we are predicting the binding to a given protein. We’re predicting PK properties, things like metabolic stability, or bioavailability and so on. We are predicting the synthetic feasibility of a molecule. How hard is it going to be to synthesize that molecule and what would be the ideal synthetic route to make it.

Eventually, we’re also including predictions around the possible toxicities or side effects of that small molecule. All of those go into the selection of a handful of molecules that will be made on our automation platform, and tested in a variety of in vitro, and then in vivo models of that disease.

[0:05:19] HC: So, the data you’re working with is a small molecule data. What does that data look like?

[0:05:25] KS: Oh, that’s a great question. I think one of the things that took us a while to understand and I think are very, very pronounced in our space is that, machine learning on small molecule data is very, very different than machine learning for vision or natural language processing. First of all, we deal with very small data sets, sometimes a handful of molecules that are already known in some of our programs, not even one molecule that is already known.

The second thing is that, the representation of a small molecule for an algorithm for a model is non-trivial, right? For images, we think of pixels. For natural language processing, we think of different ways to tokenize the sentence or the text. For small molecules, it’s really not that easy. Small molecule is basically a 3D object arranged in space with different bonds or relationships between the atoms. And the way to represent that for a machine learning model is a non-trivial one. So, people have developed lots of different representations. We have our own proprietary representations, but those are very different than the way we do machine learning in other domains.

So, from the ground up, the way we deal with the data is very different, the way we evaluate our models is very different. And that calls for different types of models, typically not your biggest deep neural network and definitely not your huge LLM, but sometimes, much smaller models or techniques that are not used as often in other domains these days.

[0:07:02] HC: You’ve mentioned that there’s multiple ways that you could represent these molecules? Can you maybe give a simple example?

[0:07:07] KS: Of course. So, one representation, maybe most commonly used is called SMILES. SMILES is basically a text-based representation of the molecule. You can think about it as, each atom represented by one or two letters, and then the connections by other characters. That is basically a 1D representation of the molecule. It’s not a one-to-one representation, but it’s a pretty good way to identify a given molecule. The challenge with SMILES is that, like I said, it’s a 1D representation, which means that we lose all information about the 3D arrangement and the 3D dynamics of the molecule in real life.

Another way to represent small molecules is by graphs. You can think about each atom as being a node and each bond being a vertex in this molecule graph. People have tried 3D representations and people have tried different types of matrix-based representations. So, there are different ways to do this. None of them is perfect and each of them can be useful for different use cases.

[0:08:18] HC: I assume, depending on how you choose to represent the molecule, that might also influence the type of model that you would train based on it?

[0:08:26] KS: Absolutely. As you can imagine, with the 1D text-based representations, people typically use more often models taken or adapted from NLP, natural language processing. With the graph representations, there are a variety of graph convolutional networks and others that people have used. People have even tried three-dimensional convolutional neural networks a few years ago on 3D representation. So yes, your representation will definitely influence that type of model and even the type of matrix you’re going to apply to evaluate this model.

[0:09:03] HC: Even before we get to evaluating it, how do you gather this data? How do you get the data that you need in order to train these models, and to get the results that you need?

[0:09:12] KS: So, I think there are a few ways to do this. Most commonly, people start with publicly available data, and there are a few different publicly available data sets, like Campbell and others that provide information about molecules in general. Some of it is property, like the solubility of a given molecule, let’s say. Some of it might be its activity for a given therapeutic target. One of the big challenges as you can imagine is that publicly available data sets come from many different labs, many different publications under various conditions. They’re typically not the cleanest data sets you can work with.

A complementary approach is to try and generate your own data. Of course, that can be very, very expensive. Synthesizing a new molecule can be an expensive process, testing it as well. So, most commonly, people use some combination of the two to try and handle this situation we have built in our automation lab, that is really focused on synthesizing more molecules using robotics, and testing them to try and give ourselves both more shots on goal, but also generate these data sets that we need to train and improve our models as we go.

[0:10:27] HC: And that data you’re able to generate, I assume that goes partway towards validating your models. What other techniques do you use to validate?

[0:10:36] KS: I think like in most domains, the first step is to evaluate a model in silico. Before we even go and make any molecules, we put these models to pretty rigorous testing. One of the big challenges with small molecules is that we deal with data that is not uniformly distributed. But the best way to think about it is that, when you train, let’s say a vision model on photos on the Internet, your dataset pretty much represents the distribution of interest. It’s very unlikely that you’re going to be asked to predict on an image that is completely fundamentally different than what you’ve seen in your training set. That’s not the case with small molecules.

In small molecule discovery, our data is highly clustered and biased in specific regions. So, we might have, let’s say, 20 examples of very, very similar molecules, but the task, the problem we’re trying to solve is to predict the properties of a molecule that is completely different, a novel one. There is no interest in predicting a molecule that is similar to what we already know. We already know about it, we don’t need to discover it. So, fundamentally, the challenge is out of distribution, accuracy, or prediction. To do that, we have developed a variety of in silico metrics and ways to evaluate models that we keep proprietary at DeepCure. It starts from the way we split our data. Just your traditional random train validation test split is not going to work in small molecule drug discovery. You need to do something different.

Then, the type of metrics you’re going to apply. They need to take into account how close the molecule is to the distribution of interest. So, once we put a model or a variety of models to this kind of testing, we’re then going to choose the one that will be progressed to production, and will be used to design the molecules we’re going to eventually make. Of course, in each iteration, once we get the data, we then evaluate the model, we try to improve it and fix whatever biases we identify in this process and reiterate on that.

[0:12:43] HC: It sounds like to understand the molecules that you’re working with and the data around it requires a fair amount of expertise that your average machine learning developer likely doesn’t start out with. How does your machine training team collaborate with domain experts in order to be sure they’re training models in the most appropriate way?

[0:13:01] KS: I think this is probably one of the best things about DeepCure. It’s this very, very tight collaboration between the domain experts and our machine learning scientists. At this point, most of our machine-learning scientists understand chemistry better than I did when I started DeepCure. It’s because of this tight integration of the teams. Actually, the way we work is in project teams that work on a specific drug discovery project. Whether you are a biologist, a chemist, or a machine learning scientist, you are working on solving a drug discovery problem, not a machine learning problem. The machine learning is a tool to do that.

Those teams work very closely together, they see it every day, they iterate on these designs of new molecules. The chemists might provide feedback on the type of molecules we got in a given iteration, and what they might think is happening under the hood. But then, the machine learning scientists will take that feedback and try to integrate it into the model, or more commonly, the data that was used to train this model, and try to improve and mitigate some of these biases that we identify in the process. In each step of the discovery process, it’s this back-and-forth going between the domain expert, most commonly the medicinal chemist or the structural biologist, and the machine learning scientist, kind of bouncing ideas. This is what they see from the algorithm. “What do you think? Oh, you don’t like this? Okay, let me try and fix that.”

This type of back-and-forth is what enables us to both inject innovation into this process, propose ideas that the human expert would probably not have thought of by themselves. But at the same time, use the feedback to improve our models as we go. What we’ve learned is that, this is a must-have in this domain, exactly because your average machine learning scientist doesn’t have the chemistry intuition. We need this feedback, and we need to integrate this feedback back into our models to make the prediction make sense. So, this is really how we do it at DeepCure.

[0:15:08] HC: Hiring for machine learning can be quite challenging due to the high demand for professionals in this fields. What approaches to recruiting and onboarding have been most successful for your team?

[0:15:19] KS: I think one of the luxuries of working in drug discovery is that, the impact and the mission are so compelling that I think good people are really attracted to this space. We found that hiring phenomenal machine learning scientists was always challenging, but not as challenging as I thought. I think, people see the value, they see the value in working on a problem that matters, that can improve patients’ lives.

We at DeepCure work on autoimmune diseases, and we all either suffer from an autoimmune disease or we know people who suffer from autoimmune disease. This ability to relate, to see the value of your work, to see how it is being integrated into the discovery of a new drug is something that is really, really attractive to people. And it was really helpful for us to attract the best talent that we have on the team right now.

[0:16:14] HC: Once you do find the right candidate to hire, do your machine learning developers already have a bit of chemistry knowledge or do you have an onboarding process to help integrate them with the chemistry knowledge that’s on the team?

[0:16:25] KS: Most people start with no chemistry knowledge. When I started working on life science problems, when I joined MIT, I had no biology or chemistry background at all. I think there is also a little bit of an advantage in being unbiased in your approach. But at the same time, you have to be very, very humble to understand how much you don’t know about the space, and this is something we look for in candidates. And once we recruit them, and onboard them, the key is to really put them next to an expert, and have them work together for a while, it’s really this integration that is the almost kind of internal training that is happening at DeepCure. Because, as a small startup, we don’t necessarily have the resources to put people through long training programs, but we do have the luxury of putting them next to the best people in the field and learning from each other.

[0:17:18] HC: So, they learn as they go.

[0:17:20] KS: Yes.

[0:17:21] HC: Is there any advice you could offer to other leaders of AI-powered startups?

[0:17:25] KS: I think the most important lesson I have learned since starting DeepCure is that, it’s almost never about the AI. The AI is a phenomenal tool. There are so many things we can do now more efficiently, or more innovatively than we could have before. But the AI itself is not the goal. It’s not about having the coolest model, to be working with the most shiny new technology.

Now, when we started, it was all about convolutional neural networks. Then, it became about variation and autoencoders. Then, it became about the transformers, and recently, of course, LLMs and GenAI. But the companies that I’ve seen making impact were the companies that really focused on the problem. The problem that matters for the user or customer, or in our case, patient and trying to understand what are the best tools to solve this problem.

As I said before, we commonly used machine learning tools that are a decade or two old, and that’s perfectly fine. If it solves the problem, if it helps us get to the best possible drug that will provide real value to the patient, that would be my advice. Focus on the problem, focus on the value, and work your way backwards to the best tools to use.

[0:18:50] HC: Finally, where do you see the impact of DeepCure in three to five years?

[0:18:53] KS: For us, it’s all about the drugs in the pipeline. Our lead program is about to start clinical trials next year. So, in five years, hopefully, we will be approaching approval of our first drug, getting to the market. We want to see DeepCure drugs helping patients. The first one will be in the rheumatoid arthritis space. Our second program is going after asthma. And my hope is that, in three to five years, we will be approaching the first approval. and we’ll have four or five other programs in the clinic.

[0:19:26] HC: This has been great, Kfir. I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[0:19:33] KS: People are welcome to visit our website, www.deepcure.com. Of course, my LinkedIn profile is available. People are welcome to reach out. Always happy to talk to other entrepreneurs in this space or other spaces. Yes.

[0:19:48] HC: Perfect. Thanks for joining me today.

[0:19:50] KS: Thank you very much, Heather. It was a pleasure.

[0:19:52] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I hope you join me again next time for Impact AI.

[OUTRO]

[0:20:02] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe, and share with a friend. If you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]