Foundation Model Series: Enabling Digital Pathology Workflows with Dmitry Nechaev from HistAI

What happens when you combine AI with digital pathology? In this episode, Dmitry Nechaev, Chief AI Scientist and co-founder of HistAI, joins me to discuss the complexity of building foundation models specifically for digital pathology. Dmitry has a strong background in machine learning and experience in high-resolution image analysis. At HistAI, he leads the development of cutting-edge AI models tailored for pathology.

HistAI, a digital pathology company, focuses on developing AI-driven solutions that assist pathologists in analyzing complex tissue samples faster and more accurately. In our conversation, we unpack the development and application of foundation models for digital pathology. Dmitry explains why conventional models trained on natural images often struggle with pathology data and how HistAI’s models address this gap. Learn about the technical challenges of training these models and the steps for managing massive datasets, selecting the correct training methods, and optimizing for high-speed performance. Join me and explore how AI is transforming digital pathology workflows with Dmitry Nechaev!

Key Points:

Background about Dmitry, his path to HistAI, and his role at the company.
What whole slide images are and the challenges of working with them.
How AI can streamline diagnostics and reduce the workload for pathologists.
Why foundation models are a core component of HistAI’s technology.
The scale of data and compute power required to build foundation models.
Outline of the different approaches to building a foundation model.
Privacy aspects of building models based on medical data.
Challenges Dmitry has faced developing HistAI’s foundation model.
Hear what makes HistAI’s foundation model different from other models.
Learn about his approach to benchmarking and improving a model.
Explore how foundation models are leveraged in HistAI’s technology.
The future of foundation models and his lessons from developing them.
Final takeaways and how to access HistAI’s open-source models.

Quotes:

“Regular foundation models are trained on natural images and I'd say they are not good at generalizing to pathological data.” — Dmitry Nechaev

“In short, [a foundational model] requires a lot of data and a lot of [compute power].” — Dmitry Nechaev

“Public benchmarks [are] a really good thing.” — Dmitry Nechaev

“Our foundation models are fully open-source. We don't really try to sell them. In a sense, they are kind of useless by themselves, since you need to train something on top of them, so we don't try to profit from these models.” — Dmitry Nechaev

“The best lesson is that you need quality data to get a quality model.” — Dmitry Nechaev

“[HistAI] don't want AI technologies to be a privilege of the richest countries. We want that to be available around the world.” — Dmitry Nechaev

Links:

Dmitry Nechaev on LinkedIn
Dmitry Nechaev on GitHub
HistAI
CELLDX
Hibou on Hugging Face

Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.

Transcript:

[INTRO]

[00:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven machine learning-powered company.

This episode is part of a miniseries about foundation models. Really, I should say domain-specific foundation models. Following the trends of language processing, domain-specific foundation models are enabling new possibilities for a variety of applications with different types of data, not just text or images. In this series, I hope to shed light on this paradigm shift, including why it’s important, what the challenges are, how it impacts your business, and where this trend is heading. Enjoy.

[INTERVIEW]

[00:00:49] HC: Today, I’m joined by guest Dimitry Nechaev, Chief AI Scientist at HistAI, to talk about foundation models for digital pathology. Dimitry, welcome to the show.

[00:00:59] DN: Thanks, Heather.

[00:01:00] HC: Dimitry, could you share a bit about your background and how that led you to HistAI?

[00:01:04] DN: Yes. My background, it’s mostly machine learning background. I’ve been a passionate machine learning engineer for the past five, I think, years. Before working on HistAI, I had experience in satellite imagery, which is in a sense related to what we do now since the images are also quite high in resolution. Yes.

How I joined HistAI, I actually didn’t join. I did join the previous venture of our CEO, Alex. But it didn’t work out, and we transitioned to HistAI. I’m one of the co-founding members of the company. Since the foundation of the company, I lead all the AI-related developments in HistAI. [00:02:04] HC: Tell me about HistAI. What do you do there?

[00:02:07] DN: HistAI, this company focused on digital pathology, as the name suggests, on artificial intelligence applications to the digital pathology. Basically, we developed a platform that’s called CELLDX. That is very – its goal is to be a convenient platform for pathologists to examine whole slide images. Also, this platform is enhanced with AI algorithms to help accelerate the diagnosing process. Yes. That’s basically what we do, the platform and AI algorithms.

[00:02:53] HC: For those in the audience who aren’t familiar, could you explain more about what whole slide images are and what some of the challenges are in working with them?

[00:03:01] DN: Yes, sure. In pathology, basically, when there is a potential for cancer and a medical professional would take a biopsy which is basically slice a small piece of tissue. Then this tissue, it is embedded in a paraffin cube and then stained with special stains, and sliced into very thin pieces. The old school way is to then just look at these slices through a microscope. But nowadays, digital pathology, these pieces, the tissue samples, they are scanned using special scanners that have a very high resolution.

Then, basically, these scans are called whole slide images. The main challenge with working with them is that their share size. These images can get up to 100,000 by 100,000 pixels. Basically, no regular software can even open these which also makes it a challenge to even look at them. They require special software, but it also is a challenge to create AI algorithms for them because of their size.

[00:04:35] HC: What types of problems do you help solve for pathologists using AI?

[00:04:39] DN: I’d say the main problem is timesaving actually. Since using AI, pathologists might get a potential diagnosis and then regions of interest highlighted to look for in minutes without the need to mainly examine the whole slide. Then in the future, we hope to develop an even smarter AI. It would be like a genetic AI that would be like a virtual assistant to pathologists with a lot of capabilities to highlight certain areas. Or you could ask questions through this way. We hope to accelerate the process of diagnosing cancers. [00:05:35] HC: My understanding is a core piece of your solution here is a foundation model that enables you to build some of these tools. Why is it necessary to build a foundation model?

[00:05:45] DN: Yes. Okay. Basically, what is the foundation model? Nowadays, there is a state-of-the-art approach to solving computer vision tasks is to have a foundation model. It’s usually a vision transformer. This model, it was trained on a lot, a lot, a lot of images which makes it a good feature extractor. Then on top of when you need to solve a specific task, you can build smaller modules on top of the foundation model to solve the task it needs. You won’t need to train the whole model which is quite large. You’ll get pretty good results. I’d say even results that are better than you could have gotten if you just trained a model from scratch.

In general, most of foundation models are trained on natural images, on ImageNet, or some specialized data sets. But they’re still just regular images, and the problem with that is in pathology, images that the model sees are quite different from what we can see on a day-to-day basis. Regular foundation models that are trained on natural images and I’d say they are not good at generalizing to pathological data.

For example, if you train a model, a classifier to predict melanomas, is there melanoma in the slide? It would work well on your data set. But then if you switch to another data set, say if it was taken from another scanner, by another scanner, or from another lab, the model that was trained on natural images won’t work so well. A foundation model that was trained specifically on pathologic data, it will generalize quite well to these kinds of data shifts and small biases in data.

[00:08:07] HC: It sounds like there’s a few different reasons why it was helpful to build a foundation model. From what you said, it sounds like it’s faster to get a model up and running for a new task because you already have a model that knows how to extract features from pathology images. Your model is more accurate in the end. Like you said, it’s more generalizable. It works better for images from a different scanner collected from a different lab, different things like that.

What did it take to build this foundation model? What kind of scale of data and compute? What type of algorithms? What are the pieces that went into this?

[00:08:42] DN: Yes. In short, it requires a lot of data and a lot of compute. About scale, we have a data set of 1.1 million whole slide images, and only actually a fraction of it went into training a model, our foundation models, around one-sixth of a data set. I told you already. I said before that whole slide images are quite large in their size, so they aren’t used directly. They aren’t fed into foundation models. They are split into smaller patches. Then the model is trained on these patches. So, 1.1 million whole slide images, it’s actually like on a scale of billions if not trillions of patches.

Our model was trained – specifically, we trained two versions, Hibou-L and Hibou-B, that are based on vision transformer large and base architectures. The larger model was trained on 1.2 billion images of these patches, and the smaller one was trained, I think, on 500 million. To train a foundation model, we also require lot of compute. We used 32 A100 GPUs to train a large model. Actually, the base model, it required only eight A100s, but it’s still not a small compute.

Then how the models are trained. Basically, how are foundation models trained? We can – sort of three main approaches. First one is say you have a large data set that is annotated. Then you can just train for a classification task. Just train the model for a classification task. Another approach is to train in what’s called vision language contrastive learning, is when you have image text pairs. You can train a model to align images with their descriptions. An example of such kind of model would be a clip by OpenAI.

Then the third approach would be self-supervised learning. This approach is good for digital pathology specifically because it doesn’t require any label data. It doesn’t require labels or text descriptions. It learns good representations just from unlabeled data and do not go into a lot of detail how it works. Basically, the model tries to predict its own predictions when trained, but it tries to predict only having a fraction of the data.

Say a student model gets only some parts of an image, and the other parts are missed. Then the teacher model sees the whole image. Then the student tries to predict what the teacher model would output. But these are basically the same model student and teacher. The teacher is just in general a more than average of weights of student models.

There are a lot of methods for self-supervised training, but the most recent one and the best one is DINOv2. You can actually observe that fact that it’s the most recent and the best because there are a lot of foundation models for pathology that were trained in the recent months, and they all use DINOv2 because no other method was developed for now.

[00:12:46] HC: AI, did you experiment with any other methods? Or was it clear that DINOv2 was the way to go based on the research?

[00:12:52] DN: Basically, we’ve considered some other methods. Actually, before HistAI, I’ve tested some self-supervised methods that are based on contrastive learning. But on my tests before that, sort of from experience, they didn’t work that well. From just general papers that were comparing the most recent self-supervised methods, DINOv2 was a clear winner in that regard.

[00:13:27] HC: When you’re training a foundation model based on medical data, are there any other special considerations? Do you need to think about privacy or regulatory approval, things like that?

[00:13:38] DN: In that regard, I can say that HistAI is a HIPPA-compliant company, so all the data is anonymized. But in terms of training foundation model specifically, since it only ever sees images, some random patches, I think it’s quite hard to get some personalized data from that. I’d say it is true because of the method we used, self-supervised learning. If let’s say we had some descriptions used in training, that would be another story.

[00:14:23] HC: What are some of the challenges you’ve encountered in building a foundation model?

[00:14:27] DN: I’d say the biggest challenge was just setting it all up, having a lot of data to be prepared. This data then needs to be loaded really fast in the training. You need to have basically a cluster of GPUs. The challenge is mostly technical. [00:14:57] HC: Largely rare in the data pipeline that sounds like.

[00:15:00] DN: Yes, yes, yes. It’s a lot about data, as always in machine learning.

[00:15:06] HC: How are your Hibou foundation models different than others that are out there for digital pathology?

[00:15:11] DN: First differentiator is that we didn’t go for the largest possible architectures. Our models, it’s still quite compact you could say. The base model, it runs really fast. In that sense, we had a lot of consideration for production later, not just getting the absolute best results. It’s sort of like diminishing returns. When you train larger and larger models, you need more compute. You need more data. You get a stronger model, but it’s very small gains.

We are really focused on models being practical because a lot of other models that were trained in the recent months, they are billion-plus parameters in scale and require a lot more compute, a lot more memory to run. Even our large model, Hibou-L, it’s only 300 million parameters, and the base model is 86 million parameters. It runs an order of magnitude faster than the larger models, while retaining a lot of quality of that huge models.

Also, when we released our models, we decided to go full open-source with Apache 2.0 license. I think before us, it wasn’t really a common practice. Models were either close sourced, or they had licenses that were quite restrictive.

[00:16:57] HC: How do you know whether your foundation model is good? How do you benchmark it against other models?

[00:17:02] DN: Our model is trained on a private data set, so we can basically test it on public data without being afraid that it’s seen this data in the training that there are some leaks or something. In the training, we just ran benchmarks on various public data sets that common and are present in other papers. Other developers of other models use these benchmarks. We basically use the same standard benchmarks to compare how the model’s performance doesn’t increase, whether training should be stopped, and then also to compare our model to other models. Public benchmarks are a really good thing.

Then, also, we use – when we train new models in-house for other tasks, for specialized tasks, and then we can – sometimes, we also test how does Hibou compared to some other public models. In a sense, did we achieve the desired results?

[00:18:29] HC: How do you identify areas for enhancing your model?

[00:18:32] DN: I’d say it’s just about tasks. If we see that a certain task the model doesn’t perform as well as other models, it’s a good point. But to be honest, we don’t plan on really changing or training these models anymore since it’s quite expensive. I think we’ve hit the best possible quality for them. The only real reasonable way to improve these models is to scale them up in parameters. But this would make them not that practical for production.

[00:19:22] HC: How are you currently using foundation models in HistAI, and how do you plan to use them in the future?

[00:19:28] DN: Yes. Currently, they are used everywhere. I told you already before. I said before CELLDX platform. On that platform, there are AI algorithms we first developed. We used foundation models that are just general, that were trained on natural images. But since we have trained Hibou, we just moved all our models, our foundation model as a backbone for all the algorithms.

For example, there are classification models that classify different conditions like melanomas, meiosis, or other cancers like basal cell carcinoma or squamous cell carcinoma. Or we have segmentation models that segment images and find tumors in these images or other tissue types like epidermis, for example. Yes. The model is used everywhere.

In the future, we plan on basically all our future models, and we develop currently. I can tease it. We are currently developing a quite large model that is focused on pan-tissue segmentation. It would be a big deal for this model to work. We utilize Hibou as a backbone. It is really helpful in this regard.

[00:21:11] HC: How are you commercializing the foundation models?

[00:21:14] DN: Actually, we don’t. Our foundation models are fully open-source. We don’t really try to sell them. In a sense, they are kind of useless by themselves, since you need to train something on top of them, so we don’t try to profit from these models. Our main goal is to actually push the industry a bit to make it grow. At HistAI, we are more focused on developing a CELLDX to be the simplest, the easiest, the best, actually, way for pathologists to work with whole slide images. In terms of foundation models, we just give them to the community to be open-sourced.

[00:22:12] HC: Does this open-source license allow for commercial use, or is this more for research purposes?

[00:22:17] DN: It actually allows for commercial use. It’s Apache 2.0. It’s a very permissive license.

[00:22:24] HC: What do you think the future of foundation models for histology looks like? Where do you think this is heading?

[00:22:29] DN: In terms of just pure vision models, I think, actually, it doesn’t really – there isn’t much improvement, at least at current – the main most popular approach with patch-level models, I’d say there are developments in the industry for slide image model to basically be able to feed the entire slide into the model. These models, they are multilevel. Actually, they include patch-level models that extract features. Then these features also are used as input for a slide-level model to be able to aggregate all over the slide.

It actually just repeats the architecture of the default vision transformer two times. The second time, its inputs are not very small patches but sort of features. That type of foundation model, it is way more complex to develop because working with whole slide images, I’d say it requires a lot more compute and just memory and so on. It’s a very hard technical challenge to be able to train a large-scale foundation model that would work on whole slide images directly.

Then there’s also developments in vision language models that, currently, they work on patches as well. They allow to generate some descriptions of patches or use those patches as inputs alongside text. Then there are also developments for whole slide images to be able to use whole slide images as inputs into vision language models. I think it’s still work in progress and requires a lot more research to execute this on these models. To summarize, I’d say everything goes. Everybody nowadays tries to make a whole slide image foundation model, not just image foundation model.

[00:25:24] HC: Are there any lessons you’ve learned in developing foundation models that could be applied more broadly to other data type?

[00:25:29] DN: Yes. I’d say the best lesson is that you need quality data to get a quality model.

[00:25:39] HC: What do you mean by quality data?

[00:25:41] DN: Yes. It’s a very common empirical knowledge. I think machine learning that you need a data of good quality. That if you train a model on data that is not that clean or good in some other sense, basically, bad things go in, bad things go out. Your model would work. It’s very poorly. Performance would be very suboptimal if you train it on not the best data.

For foundation models, this example could be used. If you train your model, you need some data preparation, a way to select patches. You don’t want to have a lot of background in your training data, and you don’t want to have non-tissue patches which can happen on whole slide images. There might be pencil or something. Like something’s written on top of it or something is circled. You don’t really want that kind of data to get into your models. I think it’s just a general principle that is applied to machine learning as a whole. You need a good data to get good models.

[00:27:12] HC: Finally, where do you see the impact of HistAI in three to five years? [00:27:16] DN: I think we – HistAI, we’d like to develop an instrument that helps to a case, very comprehensively to be reliable, even for cases that are quite rare. We don’t want AI technologies to be a privilege of the richest countries. We want that to be available around the world. Yes. That’s what we are aiming to achieve.

[00:27:55] HC: This has been great, Dimitry. I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[00:28:03] DN: There’s a company site, hist.ai, quite simple. Then our models, they are published on Hugging Face, so if you just search for model Hibou. But it starts with H-I-B-O-U. Then you’ll find the models that are fully open-source, Apache 2.0. You can use them for anything. You can try our platform CELLDX. It has a testing period. It has a free version which actually allows to test all the AI widgets that are available on the platform.

[00:28:51] HC: Perfect. I’ll link to all of those in the show notes. Thanks for joining me today.

[00:28:55] DN: Yes. Thank you, Heather.

[00:28:57] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I hope you join me again next time for Impact AI.

[OUTRO]

[00:29:07] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend. If you’d like to learn more about computer vision applications for people and planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]