How can foundation models accelerate breakthroughs in precision medicine? In today’s episode of Impact AI, we explore this question with returning guest, Julianna Ianni, Vice President of AI Research and Development at Proscia, a company revolutionizing pathology through cutting-edge technology. Join us as we explore how their platform, Concentriq, and its new Embeddings feature are transforming AI model development, making pathology-driven insights faster and more accessible than ever before. You’ll also learn how Proscia is shaping the future of precision medicine and discover practical insights for leveraging AI to advance healthcare. Whether you're curious about pathology, AI, or innovations in precision medicine, this episode offers invaluable takeaways you won’t want to miss!


Key Points:
  • An overview of Julianna’s biomedical engineering background and Proscia's mission.
  • Insight into Proscia’s Concentriq platform, aiding more than two million diagnoses annually.
  • Ways that Concentriq Embeddings streamlines AI development by eliminating data friction.
  • How Concentriq Embeddings make model creation 13x faster than traditional methods.
  • Why Proscia integrates external foundation models for versatility and superior performance.
  • Flexible and efficient: how Concentriq lets users test, swap, and select models with ease.
  • Types of solutions built using these embeddings, including rapid biomarker detection.
  • Tackling AI challenges like reducing overfitting and addressing bias in medical applications.
  • Lessons from pathology: simplifying complex workflows for faster AI adoption in other fields.
  • A look at the future of foundation models for pathology and Julianna’s advice for innovators.

Quotes:

“With the rise of foundation models that are pathology-specific and more powerful than the models of yesterday, the ability to extract embeddings efficiently became even more important for us.” — Julianna Ianni

“The pathology world didn't need another hit movie. It needed a streaming service.” — Julianna Ianni

“[Continue] to innovate and [understand] what's out there. There's a lot of change in the [pathology] field right now – You're going to make plans and then you're going to need to remake those plans because things are changing so quickly.” — Julianna Ianni

“ChatGPT didn't pervade our culture because it's fantastic technology. It pervaded our culture because the fantastic technology was easy to use. Pathology should be that easy. Our aim is to drive it there.” — Julianna Ianni


Links:

Proscia
Julianna Ianni on LinkedIn
Julianna Ianni on X
Julianna Ianni on Google Scholar
Concentriq Embeddings
Concentriq Embeddings internal case study
Proscia AI Toolkit
Zero-Shot Tumor Detection Example
Previous episode of Impact AI: Data-Driven Pathology with Coleman Stavish and Julianna Ianni from Proscia


Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.


Transcript:

[INTRODUCTION]

[0:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven machine learning-powered company. This episode is part of a mini-series about foundation models. Really, I should say domain-specific foundation models. Following the trends of language processing, domain-specific foundation models are enabling new possibilities for a variety of applications with different types of data, not just text or images. In this series, I hope to shed light on this paradigm shift, including why it’s important, what the challenges are, how to impact your business, and where this trend is heading. Enjoy.

[INTERVIEW]

[0:00:47] HC: Today, I’m joined by guest Julianna Ianni, Vice President of AI Research and Development at Proscia, to talk about foundation model embeddings for pathology. Julianna is the first repeat guest on Impact AI. There have been some interesting developments at Proscia since we last talked more than two years ago that I hope to learn more about today. So, Julianna, welcome to the show.

[0:01:07] JI: Thank you, Heather. Thank you for having me again. Honored to be a repeat guest.

[0:01:11] HC: Julianna, could you share a bit about your background and how that led you to Proscia?

[0:01:16] JI: Sure. I always had a general desire to help people in an interest in technology, which kind of left me down this path. I studied biomedical engineering for my undergraduate degree at Vanderbilt University, and I found myself doing an internship in Biomedical Engineering, [Inaudible 0:01:34]. I learned a ton there. I was just both soaking up and banging my head on textbooks on Bayesian statistics. But I was really just in draw with the power of medical data and the idea that it was hiding really amazing secrets if we could just manipulate it the right way.

Then naturally, I gravitated towards what I saw as the field with the most data and the most important secrets, medical imaging. I earned my PhD in Biomedical Engineering doing MRI research. I was developing algorithms for image reconstruction and using some basic machine learning techniques to predict some patient-specific parameters for scanning and acquiring the images. Meanwhile, I was seeing this thing called deep learning really take it off. I could see how powerful it was.

I wanted to be using it, and I wanted to be part of it. Towards the end of my PhD, I was looking for a place that I could be applying my skills in medical imaging towards that. That’s how I came across Proscia and somehow seven years of class. Proscia’s grown, and I’ve had the opportunity to grow and lead the AI R&D team.

[0:02:48] HC: For those who haven’t – didn’t listened to the previous episode, can you tell me more about what Proscia does and why it’s important?

[0:02:54] JI: Proscia is transforming technology into a data-driven discipline and enabling the use of AI to advance precision medicine. Pathology impacts both life sciences research and diagnosis. Pathology data provides one of the most detailed and direct profiles of diseases like cancer. It’s uniquely positioned to impact personalized care. It sort of brings individualized therapies to market and helps to match them to the right patients. That’s why our work is important. We’re driving this transformation with concentric, which is our core platform for pathology operations.

So, from early discovery stages of drug development all the way through patient diagnosis and treatment. There are 2.4 million patients that were diagnosed on concentric last year. The pharma company is behind 34 of the top 50 most prescribed drugs. These are important. Then on top of concentric, we also deliver a real-world data offering to fuel life sciences R&D and our precision medicine AI portfolio, which is made up of over 120 research and diagnostic applications.

Then finally, most relevant to today’s discussion. We introduced concentric embeddings in October of last year to bring foundation models to the platform and really accelerate AI development. We provide a robust set of offerings to really enable our customers to capitalize on the information pathology contributes to the precision medicine –

[0:04:31] HC: The new development approach that really caught my attention is concentric embeddings that you mentioned. Could you tell us more about this and in particular, what inspired its development?

[0:04:42] JI: Yeah. As far as what does inspire the development, we ensure we’re our own best customers. We’re doing a lot of AI research and development here at Proscia. We make extensive use of a concentric before we could do that. So, this was really the thing that was missing in our own development. We were so often extracting pixels for concentric and we can do that efficiently, but we were most of the time turning around and converting those pixels to features, converting them to embeddings using an encoder model.

It was like why go through these extra steps that are so nonstandard? With the rise of foundation models that are pathology specific and more powerful than the models of yesterday, the ability to extract embeddings efficiently became even more important for us. When I proposed the idea for concentric embeddings to the team, they kind of lit up and of course found 20 ways to make it better. I’m pretty sure that Kyriakos started developing it asleep. I’ve just never seen so much information from the team. We all knew we really needed this and that concentric users needed this. It’s really something that we use all the time in our development and it kind of makes us fly compared to how we used to develop.

[0:05:59] HC: How has it transformed your development? Is this the case of reducing model development time so you can get things done a whole lot faster as well as your customers?

[0:06:08] JI: Yeah, yeah, primarily. Data scientists and researchers face so many hurdles when developing AI models for pathology. There are challenges like managing a wide variety of image formats, dealing with the massive size of both side images, managing and downloading hundreds of thousands of large whole side images and navigating complex infrastructure and MLOps. These issues take up the majority of their time and slow down innovation.

Concentric embeddings eliminate these challenges by simplifying the entire business of us. It provides users with exactly what they need to start building, which is bringing high quality embeddings that are ready for use. I really like the way Vaughn Spurrier put this. He’s our AI Research Team Lead at Proscia. He said, “I don’t have to touch a slide. I don’t have to think about a slide again.” Which is really crazy for those of us working in pathology. That, I think it’s quite a learning curve for folks coming from other fields.

In some ways, you’re really never done learning some of the complexities of managing whole side images, but that’s just a small part of the magic of concentric embeddings. How it works is a simple API call to concentric embeddings and users can request embeddings from any set of images, specifying the resolution that they want their images to be filed up and specifying one of six currently available models. In return, they receive the embeddings that are prepped and ready to fuel downstream implications, such as model to detect tumor, for example. Concentric embeddings is about accelerating AI by just removing most of that friction.

[0:07:55] HC: If I recall correctly that the six foundation models that you’ve included, none of them were developed by Proscia, they’re coming from other companies. Why include all their models instead of developing your own?

[0:08:05] JI: Yeah. Good question. Of course, that’s something that we considered as it became obvious how powerful foundation models can be. My thinking was the pathology world didn’t need another hit movie. It needed a swimming service and it needed the method to efficiently deliver a whole swath of movies and let the user choose the one that fits their current needs and get it without walking to blocks, right? It needed to be easy. To translate that a bit, we thought there were plenty of movies to choose from.

Many foundation models have come out over the past year. Many of them are trained with a very large amount of data. Many of them are pathology specific and in the most critical view, looking at model paper after paper, claiming a couple percentage point improvement over the last one, just saw another one of these literally right before this recording. It’s the same thing that happened in popular, large language models.

A lot of the research goes into achieving a tiny improvement over the last model sometimes. At some point, we have to ask, what is that improvement buying? Is it real? When you think of the infinite range of downstream tasks. I think it is important. Sometimes a few percentage points, the difference rates depending what you’re doing. Some models are, of course, better for one task than another. But when we looked at where we could make the most impact, it was just obvious to me, obvious to my team, that the common struggles of implementing these models, the walking to blockbuster, if you will. Those struggles were far more the barrier to use than having better model performance.

[0:09:51] HC: How do you decide which foundation models to support in this platform?

[0:09:54] JI: There’s no formula for that. I think that we chose to support as many models as we could. We knew that our customers needed a wide variety of models. Since we don’t believe there’s like one model to rule them all, sometimes it’s helpful to be able to trade out models. One model might work better for a different project. It’s been shown also that an ensemble of embeddings from different models can sometimes offer a performance boost.

Concentric embeddings really give our users a leg up to take advantage of both things, both of those things. Despite me saying it wasn’t the biggest barrier to entry, I know folks are interested in some of the newer models, which are constantly coming out. We have six models available now and are really quickly adding new ones. Concentric embeddings allow you to easily try out new models too without having to learn how to apply a new model. Swap out a lot of your pipeline every time, so you got the best advantage to build strong models. If you can be a little model, I have not stick in terms of your core and the platform. That’s a big thing of centric embeddings lets you do.

[0:11:06] HC: If you’re starting a new project or a new task where you’re trying to distinguish two different groups of patients. Out of all these six foundation models and perhaps more to come, how would you decide which one to use for a particular project? Do you have to try them all or is there a more systematic strategy?

[0:11:22] JI: It’s really easy to try them all with concentric embeddings. That’s one thing that you can do for sure. I know there’s been a lot of papers about – in pathology, specifically about benchmarking these models. I’m not sure what the one creator takeaway is, because everybody says they have the state of the art, right? But there are a couple models that come out ahead on some tasks rather than others. So, sometimes you might have a sense which one might work, but embeddings make it really easy to try multiple models if you need to do that. I know my team is thinking about what other ways there might be to do model selection more efficiently.

[0:12:03] HC: What types of solutions have you built using these embeddings?

[0:12:07] JI: Yeah. We’ve done a lot now. I think I can talk about it and some, I can’t. The first thing we did with concentric embeddings was a case study that’s published on our website. The team compared the time it took to generate embeddings using concentric embeddings versus just doing this without it on just a [Inaudible 0:12:26] workstation. It was 13 times faster with concentric. They then went on to build baby breast cancer biomarker detection models for the publicly available impressed data set in just 24 hours.

Basically, they weren’t able to assess each combination of biomarker and foundation model really rapidly. That was a cool proof of concept. The impressed data set is actually super small, though, for a pathology data set. That efficiency gain really grows with increasing data sets price. We’ve also created a few other basic examples of what you can do with this. Maybe my favorite so far is a zero-shot classification that is using concentric embeddings and the pathology vision language foundation model called PLIP, P-L-I-P.

My colleague, Corey Chivers really quickly was able to do tumor detection with the embeddings he retrieved from concentric for the chameleon data set. What he did was he just crafted a few keywords that indicate tumor and healthy tissue. PLIP did a really decent job at detecting tumor without any training. That enabled him to create a really nice – tumor region. You can find that in all our other example use cases of concentric embeddings on the Proscia AI toolkit on GitHub and to open source just a bunch of iPhone notebooks and a wrapper for concentric embeddings, plus some tooling for using it with concentric. Those are also available on our website in more of a blog.

[0:14:06] HC: I’ll link to that stuff in the show notes. I think that’s a great resource for listeners.

[0:14:10] JI: Awesome.

[0:14:11] HC: How do foundation model embeddings like this affect typical concerns with deep learning like bias and overfitting? Does it ease this problem? Does it make it worse? Does it change it at all?

[0:14:22] JI: I’m sure you have some thoughts on this too. But I think using foundation model embeddings, definitely something can help reduce the chances of encountering overfitting in your downstream model development. For one thing, these models are trained with a lot of data. We know that more data, if it’s figured enough, can mitigate overfitting. This should translate to some degree of embeddings to downstream models, if all other things are one of equal. When you’re using foundation model embeddings too, the compression that embeddings provide allows your downstream of all few parameters to work with.

If you’re training with a sufficient amount of data, the risk of overfitting is lower. However, working the other direction, these foundation models are getting bigger and bigger. Some embedding layers are larger than others. Both things can increase the chances of overfitting theoretically. When do we start to fit the other way? I don’t know. It’s also not impossible to overfit a downstream model, even using like the most ideal in things. But overall, I think that the chances of overfitting when making use of embeddings is lower than training most end-to-end models under typical conditions.

As far as bias in downstream models, I think it’s totally possible to create a bias model using foundation model embeddings. I wouldn’t think it’s easier or harder one way or like another to do that. When making use of embeddings, you still need to carefully up your data set, ensure you have a good balance of label cases and look out for any potential compounds, etc. So, I don’t think embeddings take that away.

[0:16:03] HC: We definitely still have to be careful, but that’s always true with machine learning bias and medical applications in particular. We have to watch out for.

[0:16:11] JI: Yeah. No magic wand for that yet.

[0:16:13] HC: Are there any lessons you’ve learned in working with foundation models that could be applied more broadly to other data types?

[0:16:18] JI: Yeah. I feel like I learned a lot of lessons from those other tips. I always have. Having enough pathology data to build these kinds of models is also so relatively new compared to other fields. I feel like it’s hard to come away with a completely new lesson. But we certainly can come up with some that are under used elsewhere. I definitely highlight this idea that building a tool that makes it easier to use models is sometimes the way to go.

Pathology is notoriously difficult in terms of the pre-processing needed, data size, non-uniformity and data types, etc. but it’s not the only fields in which some of the data wrangling, infrastructure, MLOps, whatever. Some of these tasks are the majority of the work and these things are often overlooked.

[0:17:09] HC: What does the future foundation models for pathology look like?

[0:17:14] JI: I think we’re going to keep seeing more models come out very rapidly with improvements and what’s out there now with more data, more data diversity. I think the field is latching on to anything current models are missing, but beyond that, we’re going to see an increase in amount of multi-modal models with new modalities in the mix. I think we’ll see a lot of really mind-blowing applications.

[0:17:38] HC: Is there any advice you could offer to other leaders of AI powered startups?

[0:17:42] JI: Yeah. That’s a such a good question. I think the top of my mind right now is being continuing to innovate and understanding what’s out there. There’s a lot of change in the field right now, and I don’t think that’s going to go away for a good while. You’re going to make some plans, and then you’re going to need to remake those plans, because things are just changing so quickly. Yeah, I think that’s something that rang very true for concentric embeddings. I don’t think this is what we thought we were going to be building several years ago, but it was the obvious thing to build now.

[0:18:16] HC: Finally, where do you see the impact of Proscia in three to five years?

[0:18:19] JI: That’s like 20 in AI years. I think there’s a lot we can accomplish in that time. We’re excited to see more and more labs using our platform, as I mentioned. I think that trend will continue and concentric will be the difference between labs being able to keep up with the diagnosing the sheer volume of patients coming through versus not, but that’s the basics. We plan a lot more than that to be both the data fuel and the platform that powers a large portion of lab research, enabling organizations to actually use the transformative power of AI to build the very nuanced solutions that they require.

The question I asked is, what should it look like in the future in this process of finding, using, employing your pathology data with AI to craft the next life enabling drug? This is such an overused example, but I think it should be as easy as using ChatGPT. I don’t mean that it should be a chatbot experience. I mean, it should just be as easy as using ChatGPT. ChatGPT didn’t pervade our culture because it’s fantastic technology. It pervaded our culture, because the fantastic technology was easy to use. Pathology should be that easy in our aim is to drive it there. We want to help unlock those life enabling therapies for more patients and make sure that that they can be connected to the right ones.

[0:19:49] HC: This has been great. Jullianna, I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[0:19:57] JI: The easiest way is going to approach this website, www.proscia.com. You can find pretty much everything that talks about today there. Also, definitely check out our Proscia AI toolkit. We’ll send you a link for that through the show notes.

[0:20:11] HC: Perfect. Thanks for joining me today.

[0:20:13] JI: Thank you very much for having me.

[0:20:15] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I hope you join me again next time for Impact AI.

[OUTRO]

[0:20:25] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend. If you’d like to learn more about computer vision applications for people and planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]