Today on the Impact AI podcast, I am excited to host Victor Hanson-Smith, Head of Computational Biology at Verge Genomics. He joins me today to talk about the unlocking of new drugs for neurodegenerative diseases and their mission at Verge to make drug discovery cheaper and faster.

In our discussion, Victor tells about current Verge ventures and the important part they have in developing new drugs, the role machine learning plays, and the type of data sets they work with.  We also hear about the complexity behind “omics” data and how Verge is validating machine learning models. Victor talks passionately about the importance of team building, leadership, and company culture, and the vital role they have in establishing effective machine learning models. To hear his advice to other leaders of AI-powered startups, including the three races underway, tune into today’s episode. 


Key Points:
  • How Victor ended up at Verge Genomics as Head of Computational Biology.
  • How his experience with his father’s disease ignited a curiosity in him.
  • Verge Genomics’ ventures and why it is important in developing new drugs.
  • The role machine learning plays in their technology and approach.
  • Victor elaborates on the types of data they work with in the different models.
  • More about the Human Data Atlas and how it works.
  • He talks about the Verge Genomics model setup and functionality.
  • Different challenges they’ve encountered with human omics data and machine learning.
  • How they ensure building the most effective models using complex “omics” data.
  • Verge’s validation process for machine learning models.
  • How generative AI has influenced (or not influenced) advancements at Verge.
  • Advice from Victor to other leaders of AI-powered startups.
  • Where Victor sees the impact of Verge Genomics in three to five years.

Quotes:

“We like to say that Verge Genomics is a full-stack drug discovery and development company.” — Victor Hanson-Smith

“This revolution in systems biology has the potential for new treatments for countless human diseases and has the potential to make drug discovery cheaper and faster. Long-term, it might even transform our fundamental relationship with the concept of disease.” — Victor Hanson-Smith

“One of the key differentiators for Verge is that we base our discoveries in human data.” — Victor Hanson-Smith

“At Verge, we often say, to succeed in humans, we start in humans, and so we go direct to the source.” — Victor Hanson-Smith

“We believe that no single data set or a single piece of data is sufficient for the sorts of rigorous drug discovery we’re interested in. Rather, our platform combines lots of different data types and layers and we’re looking for signals that are consistent across those layers.” — Victor Hanson-Smith

“This problem of finding the right targets, I think, is existential and one of the most upstream problems for the drug discovery industry and that’s a problem right now I don’t think is crackable by generative AI but Verge is on the frontlines of getting us closer there.” — Victor Hanson-Smith


Links:

Victor Hanson-Smith on LinkedIn
Victor Hanson-Smith on Twitter
Verges Genomics
Converge Platform


Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.
Foundation Model Assessment – Foundation models are popping up everywhere – do you need one for your proprietary image dataset? Get a clear perspective on whether you can benefit from a domain-specific foundation model.


Transcript:

[INTRODUCTION]

[00:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine-learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people in planetary health. You can sign up at pixelscientia.com/newsletter.

[INTERVIEW]

[0:00:34.3] HC: Today I’m joined by guest, Victor Hanson-Smith, head of Computational Biology at Verge Genomics, to talk about drug discovery. Victor, welcome to the show.

[0:00:43.0] VHS: Hi Heather, thanks for having me on.

[0:00:44.2] HC: Victor, could you share a bit about your background and how that led you to Verge Genomics?

[0:00:49.5] VHS: Sure. So I’m currently the head of Computational Biology, a venture-backed drug discovery development company named, Verge Genomics, and the path it led me here is that I’m trained as a computer scientist. My Ph.D. is in computer science and prior to Verge, I spent most of my career embedded in molecular and cell biology groups.

I was working with teams to combine computer science with molecular biology, in order to unlock a deeper understanding of how genomes are accidentally architected by evolution, and this was such a rewarding set of work for me. It was really on the vanguard of human knowledge but during this time, I also had an appetite to connect this knowledge and these discoveries with something more translational.

And Heather, I know I just met you but I’ll tell you something sort of personal, which was during that time studying genome evolution, I was also experiencing firsthand my father’s journey through Parkinson’s disease. So just take a breath to recognize that that experience was wrapped in heartbreak and grief and it was very personal to me.

But that experience also ignited inside of me a curiosity about this wide class of human disease that lacks effective treatment and in a moment of revelation, during that time, I realized that all of the tools and skills I had acquired studying genome architecture could be leveraged to study genome D architecture or in other words, how genomes breakdown in order to cause disease and so I was hungry to move into this field of drug discovery, specifically for neurodegeneration and this is where something unexpected happened.

Verge Genomics had just been founded, forged in the fire of Y Combinator, and Alice Zhang, the founder and CEO, well, one of the founders and CEO of Verge, reached out to me to join the team as one of the first full-time computational biologists, and that was just really a tremendous moment in my life, where it was kind of like the universe handing me exactly what I wanted in that moment and I was ready to take the leap.

So I’d like to say that at that time, Verge really was just a couple of Macs and a folding table and I got to work from day one architecting our discovery platform, which we’ll talk about more today and seven years later, seven and a half years later, we’ve grown from just a few laptops and a folding table to now, a full-fledged company with a portfolio of chemistry IP, we have a team of chemists in China, we have thousands of mice running under the lab.

We have a stem cell lab, we have clinical-stage assets that are currently in clinical trial right now. So it’s been a tremendous journey building the company from those early days to where we are now.

[0:03:39.6] HC: So what else does Verge do today and why is this important in developing new drugs?

[0:03:44.4] VHS: Yeah. So we like to say that Verge Genomics is a full-stack drug discovery and development company and so over the past seven years, we’ve built a platform named Converge and this drives new drugs from early discovery all the way to the clinic and we’ve been able to deliver that three times faster and two times cheaper than our competitors.

This is important because as many people listening to this podcast are aware, developing drugs is currently expensive. Bringing a new drug to market can cost half a billion to billions, plural, of US dollars but if we just take a step back from Verge for a moment and paint a picture of where we are in history, I’d like to think that the biopharma industry is on the precipice of this phase transition.

If we had been previously sailing maybe a wooden sailboat across the ocean, we’re about to move into like a rocket ship and think what’s driving that phase transition is this revolution in systems biology that’s occurred over the last three decades, that revolution has not fully translated to the biopharma industry or it’s about to fully translate.

So new technologies related to sequencing DNA, RNA, and Epigenome, measuring protein abundance, advances in stem cell technology, advances in drug-protein biophysics, and dozens of other things, this technology is often referred to as systems biology. When you put all of this together, it’s enabling us to measure molecular mechanisms and molecular deficits in human cells in ways that it wasn’t possible 10 years ago and I think this area of systems biology maybe has grown more in the past 20 years than any domain of human knowledge ever in history.

I don’t know if that’s true but I‘m kind of – I’m curious if it is and it’s kind of a provocative claim. So this revolution in systems biology has the potential for new treatments for countless human diseases and has the potential to make drug discovery cheaper and faster. Long-term, it might even transform our fundamental relationship with the concept of disease.

But that translation hasn’t fully happened and it’s because right now, drug discovery is locked behind this billion-dollar cost it takes to get a drug to market. So Verge Genomics has been addressing this problem, we’re building a new kind of drug discovery organization that can deliver new drugs faster and cheaper.

[0:06:16.5] HC: And what role does machine learning play in this technology?

[0:06:20.6] VHS: Yeah. So machine learning’s fundamental and central to our approach. So as a startup company, we’ve been focused on the problem of genetically complex diseases and these are diseases where there is a dozen or more genetic factors associated with that disease but no effective treatment and so our discovery platform, which is named Converge, uses machine learning across the entire process from early discovery, all the way to clinic.

So at the early discovery phase, we’re using machine learning to find new gene targets, therapeutic targets, that have an increased probability of actually working. At the pre-clinical stage, we’re using machine learning to select pre-clinical models to have the best possible match to the kind of molecular deficits we observed in actual real human tissue and so this increases our probability of success with preclinical validation.

Then at the clinical stage, we’re using machine learning to enrich our patient enrollment strategy. So we’re using intelligence there to get the best insights into what kinds of patients or patients subpopulations will respond to our drug so that we have an increased probability of success with those clinical endpoints.

[0:07:34.2] HC: And what types of data do you work with in developing these different types of models?

[0:07:38.0] VHS: Yeah, one of the key differentiators for Verge is that we base our discoveries in human data and I know that actually might sound like absurdly reductionist to say that we base our discoveries in human data but the reason I say that is because one of the most common ways historically to initiate a drug discovery program has been to use nonhuman systems, primarily rodent models, but also human cell lines that are divorced from whole body, human biology.

These approaches have been useful historically to launch lots of drugs that are on the market, things you might even be you know, taking in your daily lives today but when we turn our attention to these genetically complex diseases, ALS, Parkinson’s, Alzheimer’s, these approaches, these nonhuman and/or cell reductionist models, they haven’t produced the outcomes we want.

They have led to diminished innovation in terms of actual drug development and so at Verge, we often say, to succeed in humans, we start in humans, and so we go direct to the source, which is to say we spend a lot of energy and resources in collecting human tissue from disease lesions of interest, and then with that tissue, we do genomics, transgenomics, proteomics, lots of “omics” and this allows us to reveal new drugable disease mechanisms from that tissue.

I like to say that we’re sort of reverse-engineering disease from the primary tissue. At the earliest upstream discovery phase and just figuring out what we want to actually target with a drug, this is really my area of focus, we build a data structure called the Human Data Atlas and we have been building this and it’s an ever-growing data structure.

This is a database representation of everything we know about genome architecture, which genes interact with other genes, which genes regulate other genes, and what proteins interact with each other. We can then bring to that atlas specific tissue from a disease of interest like ALS or Parkinson’s and then we can search that human atlas for what we call disease signatures, these are a set of genes whose activity is dysregulated together and this thing becomes a foundation for us to launch our drug discovery program.

We are looking for drugs or genes that can reverse that disease signature back to healthy levels. So just to bring this full circle and answer your question, we’re working with lots of human data, lots of “omics” layers of data paired with clinical metadata, and then we’re turning that into a search problem to find new ways to treat complex diseases.

[0:10:10.2] HC: So the models that you train, they’re set up to be able to search as opposed to maybe a supervised model or some other type of machine learning model?

[0:10:18.3] VHS: It’s a combination of both and I think it’s useful to think about the data in terms of disease-agnostic data and then disease-specific data. So the disease-agnostic data is this atlas of human biology, that is largely unsupervised and that’s where the searching is happening.

The supervised ML is happening with the disease-specific data, where we’re building specific disease signatures and networks of genes that are dysregulated in a disease state. I’m not sure if that answers your question.

[0:10:47.9] HC: It does and then working with these different types of humanomics data, what kind of challenges do you encounter especially as you go to train machine learning models with them?

[0:10:57.1] VHS: I imagine anyone listening to this who has worked with any sort of “omics” data can appreciate the amount of noise that can be present in those data sets and so we believe that no single data set or a single piece of data is sufficient for the sorts of rigorous drug discovery we’re interested in. Rather, our platform combines lots of different data types and layers and we’re looking for signals that are consistent across those layers.

So there are a few specific challenges I want to highlight. The first is just getting access to tissue. So I’ll use Parkinson’s as an example. When we initiated our Parkinson’s discovery program, we went to NCBI GEO and the EMBL’s ArrayExpress. These are public repositories where there’s a lot of various “omics” data from substantia nigra, which is one of the regions where Parkinson’s manifests most obviously.

Our team incorporated all of, literally, every single data set we could get from these public repositories, and what we essentially identified were these data gaps in the public record. Despite being this wealth of data in the public repository, we determined that most of it wasn’t usable for our needs. It was underpowered, poorly annotated, not sufficient controls. So there’s this data gap and that’s what I’m leading up to is when we move into a new discovery space and new indication space, we start by finding these data gaps.

These are tissue types or patient population that isn’t represented in the public space but would be key for our discovery. So the first challenge is just identifying that gap and the way to identify that gap is to comprehensively survey the public landscape. Once we’ve identified that data gap, that’s where we strategically deploy resources to acquire our own tissue and so we have a network of brain banks and hospitals around the world we work with who can partner with us to give us access to tissues that’s relevant for what we’re studying.

The second challenge I want to highlight is identifying signaling that’s actually true across patient cohorts and this goes back to that noise problem I mentioned. In our experience, when we look at disease mechanisms in one patient cohort, they’re not always reflected in another patient cohort and so something that’s really important to us is identifying signal that actually is preserved and recapitulated across lots of patient cohorts.

So all of our discovery work today, whether it’s in ALS, Parkinson’s, or other indications has been based on signals that we can robustly and significantly find recapitulated across lots of different patient populations, which again, implies this more and more tissue we’re processing but that gives us confidence that the things we’re discovering are actually real signal rather than just being an artifact of one investigator or one lab.

[0:13:46.7] HC: So you mentioned some challenges that really require understanding the data quite deeply. How do your machine learning developers collaborate with domain experts to ensure that they build the most effective models with this complex “omics” data?

[0:13:59.4] VHS: Yeah, for sure there’s a component here of domain expertise is required for our platform to work and so that’s fundamental. Even though our platform is highly automated, it still requires you to come to with some understanding of the disease you’re trying to treat, the tissues that are relevant, and the biology involved but your question about ML developers collaborating with the domain experts, I actually think I want to answer that through the lens of leadership, team building, and maybe even company culture.

Many organizations in this space I think have teams that are relatively more siloed, whether that’s a conscious choice or not, perhaps you yourself can even think of teams you’ve interacted with where the computational biology or the data science team is siloed apart from maybe the pre-clinical team or even the clinical team. These types of siloes create conflict and I don’t think they service the science really well.

So in addition to all the things I’ve said so far about our platform and our science and our business model, there is something else special going on at Verge with respect to our approach to leadership. So I like to use the spaceship metaphor, so I like to think of a startup company like ours, as like a rocket ship shooting into space, and if there’s something structurally not integral about the space ship like it’s shaking or it’s chattering, all that stuff creates drag.

When a spaceship has drag, it just takes more rocket fuel to actually reach orbit. It may not even be able to reach orbit. So when I think about a human organization like a startup company, things like team drama, conflict, or personal politics, these are all the shaking parts of the spaceship that cause drag and so at Verge, we’ve invested maybe more than most other companies into building a team culture that reduces that kind of drama and drag.

That’s because we want to use every last drop of rocket fuel, metaphorically, to launch us far into space as we can and so a big part of this has to do with how teams interact with each other and the feedback culture we’ve built and so this is how ML developers are collaborating to domain experts. It’s very hands-on, it’s very intimate, and those siloed walls are broken down. Let me just say one more point on this, our specific approach has been to use this leadership curriculum of the conscious leadership group.

One of the key leaders of that organization is Diana Chapman and it’s been a lot of work around accountability, radical candor, mindfulness, and other topics you may have been previously exposed to. In my career working with dozens of different leadership, coaches, and programs, this particular conscious leadership group curriculum has been the most impactful not only in reducing team drama but in accelerating this cross-team collaboration.

So that our deepest engineers who lack any biology expertise are intimately paired with the world’s best bench scientists, who are also in our organization.

[0:16:57.1] HC: Yeah, that continuous collaboration and setting up your team structure, I’ve heard from others just how essential that is as well. I see that with the teams I work with.

[0:17:06.2] VHS: Yeah, even if it’s an experiment. You know, I think Verge culturally has been an experiment and we are currently getting external validation from our directors and advisers that were doing something different and it seems to be paying off. Right now, we’re a little bit more than 50 employees. I think as we move forward, the experiment is, can we continue to scale this culture to a hundred, two hundred, a thousand people?

I don’t know and I am looking forward to seeing how that works out. I’m optimistic about how that will play.

[0:17:33.2] HC: How do you go about validating your machine-learning models? Is there anything specific to the data type or the complexity in collecting data that’s required here?

[0:17:42.8] VHS: Sure. So I hope anybody who’s into the biopharma space can appreciate the ultimate validation is clinical success and so by that metric, it’s exciting times for us. We have a new drug for ALS, that’s our lead program. It was discovered entirely from our platform. It’s a novel therapeutic window into ALS, it recently passed phase one, and it’s now entering phase two. So that’s the tip of the spear for us as an organization.

So far that is a sort of validation that we’re very excited about but if we reduce the aperture on that question of validation, there is some other metrics and indicators that our ML models are validated. At the pre-clinical stage, we focus a lot on this metric of hit rate. So we have a kind of arsenal of pre-clinical phenotypes we think are relevant for the diseases we’re working in. If we just look at ALS, we’re getting a hit rate of almost 80% across the targets that are predicted by our platform and their ability to recover those phenotypes.

So that pre-clinical validation is another sort of validation and that hit rate, that phenotypic hit rate is one of the key metrics and we can make this aperture even smaller and I’ll step away from all the wet lab biology and all the human parts and actually just focus on data science benchmarks and something that my team has thought a lot about in the last few years is using retrospective data of all known clinical programs in the US.

So we’ve done a lot of work of ingesting data from across lots of very diverse indications, hypertension, breast cancer, psoriasis, you name it, using our discovery platform to predict targets and then going to look into the public record of, “Well, what drugs have actually made it to the clinic? What drugs have failed for safety reasons? Could we have predicted those outcomes knowing what we know from human tissue?” So that’s a set of benchmarks I’m very curious about that we currently track.

I think the narrative coming out of that validation is so far really positive. We’re getting a predictive power above industry standards and then we can make the aperture, the smallest possible aperture of validating ML models, which gets into the mechanics of things like leave one out and fivefold cross-validation and smoked, and some of these other like very AI-specific methods that maybe I’ll just mention their name but I’ll leave it at that because I don’t know if they’re as interesting to talk about.

[0:20:08.4] HC: AI has been in the headlines a lot lately with generative models like large language models like ChatGPT and text image models. How did the latest generative AI advancements influence what you’re working on or do they influence it?

[0:20:22.0] VHS: Yeah, I love that question. I think recently we’ve been asked flavors of this question a lot. So I think generative AI, things like ChatGPT, is good at generating new things where it has lots of data to pattern match. Generative AI is less good and in some cases, terrible at problems where it has no data to pattern match. An example that comes to mind is writing a birthday card.

If you were to ask ChatGPT or Google Bard or some other generative language model if you were to ask these systems to write you a birthday card, it will probably do a great job, and that’s because there are a lot of birthday cards from which it could train. If you were to ask a generative language model to find a new therapeutic mechanism for a human disease with no effective treatments, it would be less effective at that task and that’s because we just lack positive cases for pattern matching.

So this goes back to a point that I made earlier about there being a data gap. There simply is a data gap in what can be trained on to solve that really big hard problem. It’s a solvable problem is what I also want to highlight and I think Verge Genomics is building a bridge across that data gap. We’re on the vanguard of building a bridge across that data gap. So all that said, I do see a role today for generative AI in some spaces in the biopharma industry.

I think its current value is being most realized in protein biophysics, kind of molecular docking space. A pre-requisite of that problem is having a gene-targeted mind but once you have a particular protein you want to drug, that’s where a lot of AI I think is unlocking some novelty. One of the biggest problems that we’re solving at Verge is upstream from that of just finding the right target in the first place.

Just kind of to circle back on some things we said a few minutes ago, one of the reasons why this quest for better targets is important if we just take Alzheimer’s as an example, in the past decade there had been multibillion-dollar failures, where teams have developed the most exquisite chemistry with excellent blood-brain penetration. Just amazing chemistry work but it was against the wrong target.

It was against the target that ultimately didn’t move the needle on disease phenotypes. So this problem of finding the right targets I think is existential and one of the most upstream problems for the drug discovery industry and that’s a problem right now I don’t think is crackable by generative AI but Verge is on the frontlines of getting us closer there.

[0:22:55.7] HC: Is there any advice you could offer to other leaders of AI-powered startups?

[0:22:59.9] VHS: When I think about AI-powered startups, I think there are three races currently underway. The first race is the algorithmic race and this is to build the world’s like greatest AI model. The second race is for training data and so I like to say that if you have really excellent training data, then the sophistication of your AI in some ways is irrelevant. This is the race Verge is running in, I’ll circle back to that in a moment.

The third race is I would call it the problem identification space, looking for these arbitrage opportunities, which I’ll come back to but just back to the first race, the algorithmic race, there are organizations right now who are investing the majority of their resources into building very sophisticated AI models. I think that is one good strategy but it requires you to have really excellent input data and not all problem domains are there yet.

So this is why I think for the problem Verge is trying to solve, finding new treatments for complex diseases, the real action is in that training data race. This is the race Verge is running in, I also think it’s the most interesting race for this particular industry. So that’s why we are strategically investing lots of resources and acquiring proprietary tissue, acquiring better data, and let me just talk a little more about that third problem, the problem identification race.

This is really advice for startup leaders in general. What I think makes a billion-dollar company is when you can use existing algorithms and existing data to answer an unsolved or under-met problem and so it’s this challenge of finding these arbitrage opportunities, where there’s a question that either we haven’t solved yet or that nobody has even asked the question but there exists some data set you could use to get leverage to take action against that question, whether that’s drug discovery or any other topic.

So my advice to AI-powered startups is, “Are you even operating in a place where you have that arbitrage, that ability to connect an under-met question with data availability?” and when that exists, it’s like lightning striking and that’s where I think some of the most value creative innovative companies in the last two decades have emerged and so they have their hands on that arbitrage situation.

[0:25:25.3] HC: Finally, where do you see the impact of Verge Genomics in three to five years?

[0:25:29.7] VHS: Well, ambitiously, I think we want to become the next Gen in Tech. I’m sort of smiling and saying that but I think realistically, an achievable goal in the next three to five years would be for us to discover, develop, and shepherd three to four more new drugs into the clinic. Like I mentioned, our current program is in ALS. It’s about to enter phase two, we have a lot of enthusiasm and excitement about that.

The next bit of proof of concept for our platform, it’s demonstrated that we can take this all-in-human concept from tissue all the way to the clinic. So I think we want to now repeat that process and launch a portfolio of drugs in parallel. Just in terms of impact, when we were founded eight years ago, I felt that Verge was relatively singular in this space, and in the past three to four years, there’s been an explosion of companies that look like Verge, which is certainly flattering and so this all-in-human approach I think is starting to get more traction in the industry.

People are recognizing just the value it brings to the process and also it can reduce costs and so one impact of Verge Genomics may be just the way in which drugs are discovered will look fundamentally different in ten years than they do today and I would like to think that the work we’re doing is playing a part in changing that landscape.

[0:26:54.1] HC: This has been great. Victor, your team at Verge is doing some really interesting work for neurodegenerative diseases. I expect that the insights you shared will be valuable to other AI companies. Where can people find out more about you online?

[0:27:05.9] VHS: They can go to vergegenomics.com and we have press releases and you can read about our leadership team, and we have multiple open positions if you are curious to join our team. One of the best pieces of startup advice I ever got was, always be hiring, so we’re always looking for people who are aligned and passionate about our mission to come join us.

[0:27:27.9] HC: Perfect, thanks for joining me today.

[0:27:29.5] VHS: Yeah, of course, Heather, it was my pleasure to talk with you.

[0:27:32.3] HC: All right everyone, thanks for listening. I’m Heather Couture and I hope you join me again next time for Impact AI.

[END OF INTERVIEW]

[0:27:41.8] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share it with a friend, and if you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]