Foundation Model Series: Accelerating Radiology with Robert Bakos from HOPPR

Imagine a world where radiology backlogs are a thing of the past, and AI seamlessly augments the expertise of radiologists. Today, I’m joined by Robert Bakos, Co-Founder and CTO of HOPPR, to discuss how his company is bringing this vision to life. HOPPR is pioneering foundation models for medical imaging that have the potential to transform healthcare. With access to over 15 million diverse imaging studies, HOPPR is developing multimodal AI models that tackle radiology’s most significant challenges: high imaging volumes, limited specialist availability, and the growing demand for rapid, accurate diagnostics.

In this episode, Robert offers insight into the rigorous process of training these models on complex data while ensuring they integrate seamlessly into medical workflows. From data partnerships to specialized clinical collaboration, HOPPR’s approach sets new standards in healthcare AI. To discover how foundation models like these are revolutionizing radiology and making healthcare more efficient, accessible, and equitable, be sure to tune in today!

Key Points:

Robert’s background in medical imaging and tech and how it led him to create HOPPR.
Ways that HOPPR’s AI models improve diagnostic speed and accuracy.
The significant data and compute resources required to build a foundation model like this.
Partnering with imaging organizations to collect diverse data across multiple modalities.
How HOPPR differentiates itself with ISO-compliant development and multimodal training.
The quantitative metrics and clinical review involved in validating its foundation model.
Key challenges in building this model include data access, diversity, and secure handling.
Reasons that proper data diversity and balance are essential to reduce model bias.
How API integration makes HOPPR’s models easy to adopt into existing workflows.
The real-world clinical needs and input that go into building an AI product roadmap.
Robert’s take on what the future of foundation models for medical imaging looks like.
Valuable lessons on the importance of strong labeling, compute scalability, and more.
Practical, real-world advice for other leaders of AI-powered startups.
The broader impact in healthcare that HOPPR aims to make.

Quotes:

“Having clinical collaboration is super important. At HOPPR, our clinicians are an important part of our product development team – They're absolutely vital for helping us evaluate the performance of the model.” — Robert Bakos

“Because we are training across all these different modalities, getting access to this data can be challenging. Having great partnerships is critical for finding success in this space.” — Robert Bakos

“Make sure that you're addressing real problems. There are a lot of great ideas and cool things you can implement with AI, but at the end of the day, you want to make sure you can deliver value to your customers.” — Robert Bakos

“Foundation models – trained on a breadth of data – can make a positive impact on underserved areas around the world. With the volume of images growing so rapidly, constraints on radiologists, and burnout, it's important to leverage these models to make a big impact.” — Robert Bakos

Links:

Robert Bakos
HOPPR
Robert Bakos on LinkedIn

Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.

Transcript:

[INTRODUCTION]

[0:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven machine learning-powered company.

This episode is part of a mini-series about foundation models. Really, I should say domain-specific foundation models. Following the trends of language processing, domain-specific foundation models are enabling new possibilities for a variety of applications with different types of data, not just text or images. In this series, I hope to shed light on this paradigm shift, including why it’s important, what the challenges are, how it impacts your business, and where this trend is heading. Enjoy.

[INTERVIEW]

[0:00:50] HC: Today, I’m joined by guest, Robert Bakos, Co-Founder and CTO of HOPPR, to talk about a multi-modal foundation model for medical imaging. Robert, welcome to the show.

[0:01:00] RB: Thank you. Thanks, Heather, for having me.

[0:01:02] HC: Robert, could you share a bit about your background and how that led you to create HOPPR?

[0:01:05] RB: Sure. Yeah. I started my career as a software engineer. I’ve worked in a couple of different industries before joining a company called Merge Healthcare, which was really my first foray into the medical imaging space. There, I was tapped to lead an innovation team that was building out an integration hub for different risk pack solutions that they have. That was a really great experience. I left Merge Healthcare for a company called Higi. This was a startup, and there, I joined Dr. Khan Siddiqui, who’s an entrepreneur and prominent radiologist. Together, we built out the product development team at Higi. We developed a digital health platform, our fleet of FDA Class II biometric screening kiosks and IoT platform, a number of other software solutions. During this time, started getting into machine learning.

Ultimately, I became CTO at Higi. We had successful exits, and then Khan and I decided to work on our next collaboration. We were attending different conferences, radiology conferences. As a radiologist, he’s really in-tune with the radiology community. We basically recognized some big gaps in the AI solutions that were being implemented in radiology that we were seeing out there, and saw an opportunity to make a big impact. Collaborated with a few other team members, and we co-founded HOPPR.

[0:02:24] HC: What does HOPPR do? Why is it important for healthcare?

[0:02:27] RB: Yes. At HOPPR, we’re building foundation models for medical imaging. These are large vision models that are trained on millions of medical imaging studies. These are across modalities and patient populations and regions. Really, they’re used primarily to automate the process of identifying findings in images. Yeah. I mean, that’s primarily the goal.

This is critical for healthcare, especially in radiology right now, because there’s some big challenges in radiology. Imaging volumes right now are growing really rapidly. I think there’s almost 5 billion imaging studies performed globally each year. In the US alone, I think the statistic is that there’s been more than 40% increase in imaging studies per radiologist since 2006. Really, there are not enough radiologists to keep up with this demand. Basically, a lot of studies are showing that almost half of radiologists are reporting symptoms of burnout. This means that patients are waiting longer to get their imaging studies read. I read an article recently, highlighting how some of these studies were taking over a week in certain states, over two weeks, actually, in certain states.

This is slowing down patient care, and also, this strain that it’s putting on a radiologist has the potential to put patients’ safety at risk. These are big issues. There are AI solutions out there that are trying to move the needle to address these things, but they tend to be point solutions. Getting access to this data is challenging. A lot of these small models have been trained on isolated data sets, which means they may not generalize well. Maybe they only work in one facility and not another. Then often, they really don’t integrate well, don’t integrate seamlessly into the radiologist’s workflow. Adopting these models can be really painful. A generalized foundation model that’s trained across a breadth of imaging data really has the potential to address a lot of these issues in a more scalable way.

[0:04:24] HC: What does it take to build a foundation model like this? What scale of data and compute and so on are we talking about?

[0:04:31] RB: Yes. Yeah, to build a foundation model, it is certainly a technical challenge. We’re aligning medical images with language that’s used in radiology reports and other supplemental data. This involves using a lot of deep learning techniques and different technologies, like vision transformers and LLMs. As you brought up, data is a significant barrier to entry. We have tools that allow us to be able to collect and ingest hundreds of terabytes of data at a time, which we have to de-identify that data, index that information into a data warehouse, then we can use it in a variety of ways for data balancing, or for training, for validation, building out different cohorts of data that we use for a variety of purposes.

The model itself obviously requires a lot of additional effort on the technical side to implement. In addition to the raw data that we get, we have to be able to label that data and organize that data. Of course, you need expertise and computer vision and natural language processing, expertise in data engineering and folks that can manage scalable compute infrastructure, and of course, capital to utilize the GPU clusters that you need for training. GPUs are not cheap, as I’m sure you’re aware.

I think one of the biggest things to really call out in building a foundation model, at least in the medical imaging space, is having the clinical collaboration is super important. At HOPPR, our clinicians are a really important part of our product development team. They’re determining the quality of our data. They’re giving us context into how we need to leverage the data appropriately, and they’re absolutely vital for helping us evaluate the performance of the model. Ultimately, yeah, I mean, I think those are the key elements of building a foundation model.

[0:06:22] HC: Where do you get the data that’s needed to build this?

[0:06:26] RB: We collaborate with different data partners and data brokers to be able to collect this data. We have some really great partnerships with some organizations, who are both inpatient and outpatient imaging organizations. We’ve been able to basically ingest hundreds of terabytes of this data. Right now, we have over, I think, over 15 million imaging studies at our repository and regularly importing more and more data rapidly, I think. Our partnerships have been fantastic. Obviously, because we are training across all these different modalities, getting access to this data can be challenging and having these great partnerships is critical for finding success in this space.

[0:07:10] HC: You’re working with multiple types of radiology images, is that right?

[0:07:14] RB: That’s correct. I mean, we’ve ingested millions of CTs, X-rays, mammograms, we’re pulling in tomosynthesis data. MRIs will be coming a little later next year. A number of different modalities, really. It’s based on the priorities that we have with delivering solutions for our customers.

[0:07:35] HC: There are a number of other foundation models out there. Research groups in academia keep publishing different ones for many different types of imaging modalities, radiology included. How is your foundation model different than the others that are available for medical imaging?

[0:07:50] RB: First, everything that we build is based on our ISO 13485-compliant quality management system. We’re building medical-grade models that are intended to be used within medical device software. It’s really critical to make sure that upfront, you’re designing your model, you’re leveraging your data in a particular way. Even your platform infrastructure is designed and built under a compliant solution that will ensure that you’re needing those medical device standards. That way, customers are using their solution, they can leverage it and then bring their solution to the FDA. That’s one critical differentiator, I think, for what we’re building.

The models themselves are – these are trained on 16-bit high-resolution images. These are multi-modal models that we’re training, both on image and text data, which we can fine-tune in a variety of ways to support our customer-specific use cases. Those are some differentiators. We want to make sure that we’re building a foundation model that can be customized to meet our customers’ needs. A generalized foundation model is good in a lot of things, but once you fine-tune it, it can make it great at very particular tasks. Offering the solutions that can leverage the model in that way is a differentiator.

I think one of our other biggest assets in differentiators in building this model is the breadth of our training data as well, which we supplement with strong labeling partners, so we can improve the quality of that data and then iteratively improve the approach to our model. We’re using a number of technical techniques that we use to improve the model as well, just different contrast of learning in context learning, a variety of additional techniques to improve model’s performance and then a handful of secret sauce techniques that I probably, I can’t share too much about here. Yeah, ultimately, our clinical-led development, I think, really ensures that the models we’re building will meet the high standards of health care. We think that we have a number of differentiators there.

[0:09:57] HC: A foundation model can be used for a variety of different tasks. That generalizable and adaptable aspect is fundamental there. How do you know that your foundation model is good? How do you validate it for all the different things that could be used for?

[0:10:10] RB: Yeah. We evaluate our model using a combination of quantitative and qualitative measurements. On the quantitative side, we’re looking at metrics like, sensitivity, specificity, F1 scores, different standard machine learning metrics that we use for different types of findings across different modalities and features. We also use, if you’re familiar with large language model evaluation, there’s a number of NLP scores, these are scores like Blue and Rouge and Meteor scores. These are generally used to evaluate text similarity between models generated text and ground truth text data.

These tend to be maybe imperfect scores because a lot of them are based on text similarity. What happens is that you may have text that you’re generating that mention something. It might get a high score, but you may actually have said, oh, there is no cardiomegaly in this study. The image actually shows cardiomegaly, you get a high score in that case because it contained the same word that doesn’t really understand concepts, like negation and things like that. There’s a lot of challenges in evaluating output. These are the types of things that we leverage. That directional, these metrics provide that directional accuracy. Then, of course, you have to take that a level deeper and work more directly with the model and with the model results with your clinical team, right?

Of course, we’re leveraging our clinicians to perform that clinical evaluation of the models using – we use our diverse data sets. They evaluate the models against the different types of cases. Then when we fine-tune our models for customers, we also collaborate with them to meet their specified performance metrics. We’ll use our own internal validation data. We’ll collaborate with them with their test data sets and determine what are the optimal metrics. Generally, we have a breadth of different metrics that we leverage to be able to determine the quality of the foundation model.

[0:12:14] HC: What are some of the challenges you’ve encountered in building a foundation model?

[0:12:17] RB: Access to data is one of the big challenges, obviously. I wouldn’t say that this is probably a challenge for every foundation model. I know everyone’s heard ChatGPT is often trained on lots of the Internet. That’s somewhat more easily accessible. I mean, the medical space, getting access to data is obviously much more of a challenge. Especially with images, often, this data is stored in these PAC systems that were not really designed to export data. These were intended to store medical image data. Often, they’re just not designed to, in a scalable way, be able to provide access to this data. Getting access to it is painful.

Often, we have to ship devices to our data centers, where they’re collaborating with us to load the data onto these devices that we use, then get shipped back and we unload the data there. It’s a fairly tedious task. Identifying the data sets is also a challenge, at least ahead of time. These imaging centers and hospital systems, they don’t really index the information in a way that’s intended to be able to identify the data, what’s in the data. You know that there’s an X-ray for a particular patient and you might have access to their labs and some information generally, but you don’t know that the X-ray showed a rib fracture on the third rib, unless you actually analyze the radiology reports.

A lot of this information is buried in radiology reports. These are often free text reports. You’ve got to build tooling to be able to parse the text out of those reports and normalize that and index it in a certain way. Those are all some interesting challenges. Of course, there’s privacy and security concerns around handling this data as well. You have to have the right policies in place, the right security controls. These are really large files. They’re expensive to store. They’re painful to move. Data generally is a big pain point in this space. I would say, another big challenge is data diversity across different institutions. In many cases, the data can be really unbalanced. For example, data coming out of an outpatient imaging center may be more heavily weighted unremarkable, or normal studies.

You have to really be able to get an understanding of the composition of this complex data, so that you can utilize it properly in the training. Building out tooling to balance this data effectively and making sure the model sees enough of the right cases and not too many of the wrong cases that could prevent it from generalizing properly. Those are critical things. Then, there’s other variables outside of just the findings. You need to make sure you’re balancing across things, like different scanner manufacturers and scanner versions and patient populations and imaging protocols. There’s a lot of variables that come into play with managing how you want to leverage this data in an effective manner.

I would say, another third challenge here is working mostly with this unstructured data. Again, that can be painful. We’ve had to fine-tune our own custom language models to be able to effectively extract the key findings and be able to index those in our data warehouse, so that we can use those for the purposes that we have. Then finally, it can be challenging to minimize hallucinations and errors in some of these things. When you’re training on report text and different additional texts, the model is basically learning what it should say. Often, it will hallucinate. Building out technical approaches to eliminating those hallucinations and errors is critical, things like reinforcement learning with our clinicians and using direct preference optimization. There’s a number of techniques to do this, but all of these things are important to leverage. They present some interesting challenges in building foundation models.

[0:16:24] HC: In some ways, it’s similar to everything else machine learning related, in which, it’s frequently the data that’s far more challenging than the machine learning algorithms themselves.

[0:16:34] RB: Absolutely. Yeah, yeah. I mean, you’ve seen that the technical innovations are moving really rapidly. There’s new model architectures and frameworks coming out all the time. But ultimately, curating the data set properly, balancing that data set properly, adding annotations that help augment that data to be able to really optimize it are super critical for success in this space.

[0:17:02] HC: Yeah, and particularly intrigued by your focus on data diversity. This is important to all types of machine learning to reduce bias. For foundation models, there’s been a trend towards larger data sets, larger models, just throw everything you have at it. A lot less focus on the diversity of the data that goes into it. It’s great to hear that you’re putting some emphasis into that and thinking carefully about what data does go into your models.

[0:17:28] RB: Right. I think it’s somewhat unique in this space, because as I mentioned, the diversity of the data is very different in the medical imaging space. I think, if you’re training across the Internet, there’s much less likely that you’re going to stumble on 50% of your data being the same article, or something like that. When you’re working with medical image data and you have these high percentages of data, where the radiologists are saying, there’s nothing remarkable in this, essentially, it’s the same. Sometimes we have that in fairly large numbers. Really being able to deep dive into it and understand what you have and how to leverage it is the name of the game.

[0:18:10] HC: How are you commercializing your foundation model?

[0:18:12] RB: Yeah. We really positioned our solution as a component of a medical device. We’ve made our models accessible through an API integration in a similar manner to GPT with OpenAI. This allows us to host these models and allows our customers to be able to embed these models in existing applications within their existing workflow. We believe that this represents the optimal approach for commercialization because it allows our customers to really focus on building the applications that they care about. We can focus on the model itself. We have the quality management that supports their efforts. We can provide the necessary documentation in any other 510K filings. Can help them fine tune the model for their specific use cases. They can really focus on the workflow or the application that they’re building. That we think is really key differentiator for us and something that we think is really critical for our customer’s success.

We’re also building out great partnerships with a variety of development partners to collaborate and fine-tuning the models. We even have with our model, hosting infrastructure. Basically, we have a medical-grade inferencing service that allows us to host our models, host third-party models through our platform. We’ll have the ability to collaborate with partners to fine-tune those models through the platform. Ultimately, our goal is to provide broad accessibility and make a positive impact on patient care. We think that through our API-based mechanism, that’s going to be the optimal approach there.

[0:19:52] HC: Now, training a single foundation model, of course, is not the end of the story. It’s just the beginning of it. How do you go about developing a product roadmap for a foundation model? Is there anything you do differently than you would for a roadmap for another type of AI product?

[0:20:06] RB: Well, you start really with the clinical needs. Ultimately, you want to make sure that you’re addressing real problems. There’s a lot of great ideas and cool things you can implement with AI, but at the end of the day, you really want to make sure you can deliver value to your customers. Whether that’s speeding up workflow for your providers, maybe you’re improving accuracy, maybe you’re reducing burnout, right? There’s a lot of different approaches. Identifying what’s the right use case with clinical input, collaborating with your partners to determine what’s the highest criticality from their perspective is super important.

From there, you’re going to prioritize those tasks that align with those goals. You’re going to make sure collecting the right data that you need for when you want to target having this again, because of lead time with the data, that can be challenging. If you’re going to deliver a different modality in some future, let’s say, Q1 of 2025, you want to make sure that, well, you have the train rolling with getting the data that you need, and so you’re collaborating with your data partners to start indexing that data. Make sure you have enough of the data to address the problem that you need, figure out, prepare the model training.

Often, we have to reserve different GPU clusters ahead of time, depending on the throughput that we need to support. Of course, preparing the validation data sets, making sure we have gold standard data sets that we can use to evaluate the model, reviewing compliance regulations. Maybe different modalities, or different use cases have some different constraints there that we need to address. Really preparing all these things upfront is really critical for building out the roadmap for these different things. It’s not purely just, when do I have the ability, the time to address it?

It’s very different from software development in that regard. Obviously, lining up the data, lining up the compute, lining up the clinicians is a critical aspect of it. That collaboration process with engineering, with clinical, even with our commercial teams and our partners is a critical aspect of building up that product roadmap.

[0:22:20] HC: What do you think the future of foundation models for medical imaging looks like?

[0:22:24] RB: I think that foundation models will be integrated into everyday medical workflows. These foundation models will allow for nastic insights across different imaging types, different data sources. I expect EHR data, pathology data, a variety of additional demographics, information that isn’t even accessible to providers today. I think, eventually will be incorporated. This will really help support and personalize healthcare. By combining this data with patients’ clinical history, you can really provide more precise treatment and diagnosis for different things. I think that this is really going to help make a really positive impact on closing care gaps, especially in areas like, underserved locations.

I don’t know how accurate this statistic is. What I have heard recently that there’s something – maybe I’ll just say, an extremely low number of pediatric radiologists serving all of Africa. I heard the number is eight. I don’t know if that’s actually true, but it was a shocking number to me. That makes you realize that foundation models that have been trained on a breadth of data can really make a positive impact on underserved areas around the world. This is really something, with the volume of images growing so rapidly and the constraints on radiologists and the burnout, it’s really important to leverage these types of models to make a big impact. I see them making a huge impact on medicine. I think that adoption is going to be growing really rapidly here as folks start to see the value.

Obviously, regulation will have an impact on the pace of adoption on that front, but I think that it’s going to happen. Doing it responsibly is going to be critical. I do think that it will be something that’s really becomes a part of all medical workflows, not just medical imaging.

[0:24:20] HC: Are there any lessons you’ve learned in developing foundation models that could be applied more broadly to other data types?

[0:24:25] RB: I would say, making sure you have a deep understanding of your data composition is super critical. Leveraging the right labels at different stages of training. There are definitely scenarios where we’ve had to do labeling to optimize the performance of the models. Radiology reports are what I would call weak labels that often do not contain the full picture. Sometimes these reports have information. They’re assuming that the future radiologist already has the history of the patient, so they may not mention every little thing in every radiology report. That means that you often have to dig a little deeper and making sure that the quality of your data is really where it needs to be to build a model in the way that you need. I would say, that labeling and that evaluation is absolutely critical.

Feedback loops, the subject matter experts are essential for reliable – building reliable AI models. Finally, I would say, finding the right compute partners to support that scalable training is super important. The GPU access is a challenge, and finding the right compute partners that you can collaborate with, that give you the flexibility that you need for training, especially when you’re doing training at scale with millions of images is a big factor in your success.

[0:25:46] HC: Thinking about more broadly about your role as a founder of HOPPR, is there any advice you could offer to other leaders of AI-powered startups?

[0:25:54] RB: I would say, focus on solving real world end-user problems. Work with your customers to make sure that the thing that you’re building is going to address their need and ensuring that their solution is going to integrate seamlessly into existing workflow is really critical as well. Folks won’t adopt solutions that require a lot of custom integration, or anything like that. You really want to make sure you can bring your solution into the world in as easy a way as possible for those customers to adopt it.

Make sure you build partnerships. That may be with regulatory bodies, clinical expertise, data providers, research partners. Those partnerships are absolutely critical in achieving success. I think the last thing is probably, if you’re considering working in regulated environment, just make sure you do your homework upfront. Collaborate with experts early. You don’t want to get too far down the path of building a product, only to find that you have to make sure you’re documenting that process from the ground up. You don’t want to have to unwind things. Upfront focus on one of the steps necessary to be compliant with the regulations in the space that you’re building a solution in.

[0:27:06] HC: Finally, where do you see the impact of HOPPR in three to five years?

[0:27:10] RB: I think HOPPR is going to be a leader in AI-generated medical imaging solutions. We’re going to be expanding across different imaging modalities over the next few years. We’re expecting to be deeper into clinical system integrations. We’ll be having a robust feedback loop with our partners, continually driving improvements there, and then supporting broader diagnostics and potentially more personalized treatment decisions with our foundation models and our platform generally. Ultimately, I think that we’ll be making a really huge contribution to more efficient and equitable health care.

[0:27:46] HC: This has been great, Robert. I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[0:27:54] RB: Well, you could find us at hoppr.ai. That’s H-O-P-P-R.ai. Of course, connect with me on LinkedIn.

[0:28:01] HC: Perfect. Thanks for joining me today.

[0:28:03] RB: My pleasure. Thanks for having me, Heather.

[0:28:05] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I hope you join me again next time for Impact AI.

[END OF INTERVIEW]

[0:28:15] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend. If you’d like to learn more about computer vision applications for people and planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]