In this episode, I sit down with Jean-Baptiste Schiratti, Medical Imaging Group Lead and Lead Research Scientist at Owkin, to discuss the application of self-supervised learning in drug development and diagnostics. Owkin is a groundbreaking AI biotechnology company revolutionizing the field of medical research and treatment. It aims to bridge the gap between complex biological understanding and the development of innovative treatments. In our conversation, we discuss his background, Owkin's mission, and the importance of AI in healthcare. We delve into self-supervised learning, its benefits, and its application in pathology. Gain insights into the significance of data diversity and computational resources in training self-supervised models and the development of multimodal foundation models. He also shares the impact Owkin aims to achieve in the coming years and the next hurdle for self-supervised learning.

Key Points:
  • Introducing Jean-Baptiste Schiratti, his background, and path to Owkin.
  • Details about Owkin, its mission, and why its work is significant.
  • The application of self-supervised learning in drug development and diagnostics.
  • Examples of the different applications of self-supervised learning.
  • Discover the process behind training self-supervised models for pathology.
  • Explore the various benefits of using self-supervised learning.
  • His approach for structuring the data used for self-supervised learning.
  • Unpack the potential impact of self-supervised AI models on pathology.
  • Gain insights into the next frontier of foundation model development.
  • He shares his hopes for the future impact of Owkin.


“To be able to train efficiently, computer vision backbones, you actually need to have a lot of compute and that can be very costly.” — Jean-Baptiste Schiratti

“There are some models that are indeed particular to specific types of tissue or specific sub-types of cancers and also the models can have different architectures and different sizes, they come in different flavors.” — Jean-Baptiste Schiratti

“The more diverse the [training] data is, the better.” — Jean-Baptiste Schiratti

“I’m convinced that the foundation models will play a very important role in digital pathology and I think this is already happening.” — Jean-Baptiste Schiratti


Jean-Baptiste Schiratti on LinkedIn
Jean-Baptiste Schiratti on X

Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.
Foundation Model Assessment – Foundation models are popping up everywhere – do you need one for your proprietary image dataset? Get a clear perspective on whether you can benefit from a domain-specific foundation model.



[00:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine-learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people in planetary health. You can sign up at


[0:00:34.4] HC: Today, I’m joined by guest Jean-Baptiste Schiratti, medical imaging group lead and lead research scientist at Owkin, to talk about drug development and diagnostics using self-supervised learning. Jean-Baptiste, welcome to the show.

[0:00:47.6] JBS: Welcome and thank you for having me, Heather.

[0:00:49.5] HC: Jean-Baptiste, could you share a bit about your background and how that led you to Owkin?

[0:00:53.6] JBS: Sure. So, I have an academy training in applied mathematics, I received my Ph.D. degree from the University of Paris-Saclay in Paris, I was prepared both at Ecole Polytechnique, Brain & Spine Institute in Paris Hospital. During this Ph.D., I developed statistical methods to model the spatiotemporal progression of neurodegenerative diseases such as Alzheimer’s disease.

And shortly after my Ph.D., I did a postdoc at Telecom ParisTech, it’s a French engineering school and worked on automatic detection of epileptic seizures from EEG recordings and shortly after this postdoc in September 2019, I joined Owkin. So, I joined Owkin first as a data scientist in a team that was called the radiology team.

In the beginning of 2021, the team, so the radiology team was merged with the histology team into a new team called Medical Imaging Group and this is the moment when I was given the opportunity to co-lead this R&D team and I still occupy this position.

[0:01:53.5] HC: So, what does Owkin do and why is this important for healthcare?

[0:01:57.4] JBS: So, Owkin is a tech bio that uses artificial intelligence to accelerate precision [0:02:02.1] by discovering and developing new drugs and where we recently reached the Unicorn status and we were also named to the World Economics Forum’s Unicorn Community, working in the Foreign Centre for Excellent Healthcare. So, Owkin was founded in 2016 with the mission to understand complex biologies through AI.

And so, to do this, we identify precision therapeutics, zero risk, accelerate clinical trials, and develop diagnostics using AI trained on world-class patient data through privacy and then seek this through the right technologies. So, all this to finally say that it allows us to provide healthcare professional with cutting-edge tools that they need to improve patient care and provide the right treatments to the patients.

[0:02:48.2] HC: In reading a number of papers published by Owkin, a common approach I’ve seen you use is self-supervised learning. For the audience who may not be familiar, what is self-supervised learning, and when is it useful?

[0:02:58.7] JBS: So, I’ve put up a definition so I would say that self-supervised learning refers to a set of machine learning methods and algorithms that can be used to learn powerful representations from unlabeled data. That’s my definition but you may come across other definitions and so in self-supervised learning, an AI model uses what is called a pretext task. There are different pretext tasks that you can find.

So, for instance, a jigsaw puzzle or colorization, you know, they are to identify patterns and long relevant representations for what we call downstream test, so a test with label data. So, in computer vision recently, there has been like spectacular progresses on self-supervised learning, especially on the image net database and we’ve seen that these methods have been closing the gap with free supervised learning. So, most of the self-supervised learning methods now, we had quite impressive results on natural images. One of the most recent self-supervised learning methods that you may have heard of is a DINOv2 from Facebook AI research.

[0:03:59.9] HC: So, what are some situations in which you might choose to use self-supervised learning instead of a fully supervised model from the beginning?

[0:04:06.1] JBS: So, that’s a good question and in general, self-supervised learning, at least in our experience, I took in self-supervised learning, it provides generally more robust and more generalized representations than free supervised learning. So, we fully supervise learning on label data, and then monthly, we’ll be able to learn representations but it’s likely that those representations will be generalized about it to other downstream tasks.

Whereas, with self-supervised learning, we’ve seen that we can actually learn powerful representations from unlabeled data and usually, if the model is trained on a lot of data, those representations generalize well to a variety of downstream tasks. So, it means that we have wide or a large set of downstream tasks than those representations usually provide with selling our probing with performances for almost all those downstream tasks.

[0:04:55.7] HC: What does it take to train one of these self-supervised models for pathology?

[0:05:00.5] JBS: So, it takes both, obviously, compute and data. So, as we said, so the data does not need to be labeled and it’s also important to note that most of self-supervised learning algorithms such as DINOv2 or iBot are usually quite computational and intensive. So, they usually require a lot of compute because usually they’ll require quite large batch sizes and so in the end, to be able to train our computer vision backbone using self-supervised learning methods, you usually need to have a lot of unlabeled data but also, a lot of GPUs.

I’ll talk a bit later about that if you want but yes, usually, we need to have a both – a lot of unlabeled data and a lot of GPUs.

[0:05:41.3] HC: Are you able to put some numbers on some of those? So for example, some of the self-supervised models that you’ve trained at Owkin, how long did you need to train for, how many images did you need to use?

[0:05:51.2] JBS: Yeah, sure. Your question is I think a good opportunity to note that in July 2023, so last year, Owkin has made open open-source a vision transformer that was pre-trained using a self-supervised learning algorithm called iBot. So, this model that we’ve open-sourced is called Phikon. This is a vision transformer base, it has been pre-trained on images from the TCGA database, so The Cancer Genome Atlas Database.

And so, to put some very specific numbers on the training of Phikon, so Phikon was pre-trained, so it was trained using iBot on the dataset of 40 million patches or small regions extracted from whole slide images. It was trained on 32 GPUs, so 32 NVIDIA 100 GPUs with 32 gigabytes. So, that’s actually a lot of compute. It was trained for a total of 1,200 GPU hours. So, that’s roughly a week of training for an estimated cost of several thousand dollars.

[0:06:51.2] HC: So, a ton of data and by most scales, a ton of compute as well. I’m sure there are some companies who have this kind of compute available but certainly not everybody does.

[0:07:01.1] JBS: Yeah, that’s an issue, so to be able to train efficiently, computer vision backbones, you actually need to have a lot of compute and that can be very costly as you said.

[0:07:10.5] HC: One of the benefits of self-supervised learning that you’ve already mentioned is, generalizability. Are there other benefits from using this type of model instead of an alternative?

[0:07:19.3] JBS: So, one of the main benefits is yeah, generalizability. Also, I think that in some cases the representations of the embedding the talent by the model can be more interpretable or maybe easier to work with but that depends probably on the downstream plus. So, I think that the main advantage is the generalizability, yes.

[0:07:41.2] HC: And how have you used these self-supervised models? Were they used in building some of Owkin’s recent products? [0:07:46.9] JBS: Yes. So, I’ve talked about Phikon, so the model that Owkin has made open source in July 2023. Obviously, we have other models that have been pre-trained using self-supervised learning. So, some of them are currently used in two of our diagnostic tools. So, one of the tool is called RlapsRisk, which is a prognostic tool to add pathologists and oncologists to predict relapse in patients suffering from breast cancer.

And we have another one that is called MSIntuit which assists pathologists in pre-screening of caloric-type cancer, patients with microsatellite instability. So, both of these diagnostic tools developed at Owkin actually rely on models that have been pre-turned using some supervised learning.

[0:08:27.9] HC: Do they rely on the same broad model or are there models that are more specific to a particular type of tissue?

[0:08:34.0] JBS: There are some models that are indeed particular to specific types of tissue or specific sub-types of cancers and also the models can have different architectures and different sizes, they come in different flavors.

[0:08:47.7] HC: We talked a little bit about what you need to train one of these self-supervised models and in particular, a ton of data. Do you have any guidance on how to structure that dataset and does it really need to be diverse across all the different cancer types or what a smaller dataset of a particular cancer type be more helpful for certain applications? Any thoughts on best practices there?

[0:09:08.5] JBS: Yeah, so that’s a very good question. I’m not convinced that there is a one-size-fits-all answer to this question. So, I mean, I am not convinced that there is a recipe that would work in old cases. So, the main way to prepare data and especially in the context of digital pathology is just to take the order slide images, detect the regions of the images, which contain matter, and then just split them into batches.

So, then you end up with a very large, that I said, that contains hundreds of thousands or millions of patches and this is what you will be pre-training your model on. However, looking at recent methods like DINOv2 for instance, I think that the declaration plays a crucial role into training or on picturing with sub-supervised learning and so my personal opinion on this is that the declaration is probably an important step.

And regarding the preparation of the data, the more diverse the data is, the better. So, the broader – I mean, the more cancer indications we have, probably the better to train an efficient model to compute in building of pathology images.

[0:10:12.7] HC: So, generally, the broader the better but there could be certain used cases where a different type of structure for the pretraining data might be a better choice, is that?

[0:10:22.1] JBS: Yes, exactly.

[0:10:22.9] HC: Where do you see the future of self-supervised learning? And in some cases, these are called foundation models, I don’t know whether you are using that term or if it is more self-supervised in terms of talking about what you’re doing at Owkin. What do you think the future looks like with these models for pathology?

[0:10:36.2] JBS: So, as far as I’m concerned, I’m convinced that the foundation models will play a very important role in digital pathology and I think this is already happening. So, for instance, soon after we released our Phikon model, so our big base pre-train on the different cancer indications from TCGA have seen some very impactful papers coming out with different foundation models for histology.

So, for instance, there was this new UNI model and the Virgo model also. In the past months, we have seen the development of some very large and very impactful foundation models for digital pathology and I think this is very likely to continue in the near future. So, however, as of today, most of these models are not public and so when Owkin decided to open-source Phikon, I think we had a really strong and positive impact on the community.

Because we have seen that this impact is important for people working in the field of digital pathology, that is something that is quite, how can I say it? I mean, we are glad to have open-sourced this model, and also I want to mention that recently Owkin has won the Kaggle UBC-OCEAN competition thanks to this Phikon model. So, that is another way to showcase the impact that this model can have on the community.

[0:11:48.8] HC: I certainly agree, open sourcing this stuff makes a dramatic difference in the field and enabling others whether in academia or within smaller companies who don’t have the resources to train massive models but just by open sourcing it, you’re able to help push the field forward. So, I applaud you guys for that.

[0:12:06.1] JBS: Yes, exactly.

[0:12:07.0] HC: In terms of these future models, what do you think they’ll look like a year or two from now? Are they just going to be bigger or will there be other types of changes to them?

[0:12:16.5] JBS: So, it’s likely that there are going to be bigger and bigger, also trained on more and more and more data. So, for instance, the Virgo model that was developed by a paid giant, Microsoft, that is trained on way more data than our Phikon models, or that it’s our order of magnitude, they’re larger but I think also that the next frontier for our foundations model is multimodalities.

Or I’m expecting that in the near future foundation model, we’ll have to tackle this program of multimodality and not just be domain-specific large vision models for digital pathology for instance. So, by multimodality, I’m thinking that for instance combining Multi Omic, like spatial transcriptomics with digital pathology and clinical data and so I think that we can expect that in the near future, our large language models could be used to integrate different data modalities at different scales.

So, for instance, DNA, spatial transcriptomics opportunities with digital pathology and clinical data and so that’s I think a very interesting challenge for our foundations model and for our large vision or large language models in the near future. Related to this, I wanted to mention that in June 2023, Owkin announced the MOSAIC studies. So, it’s a 50 million project to create the largest Multi Omic Spatial Atlas in oncology.

And so, with MOSAIC, Owkin will cover several cancer indications and collecting more samples for more than one thousand patients for repair indication and so with each tumor sample, we’ll be able to generate spatial transcriptomics using, for instance, [inaudible 0:13:51.5] or so single-cell data using [inaudible 0:13:54.6] digital pathology clinical data and without data types and so eventually I think that the developments in foundations models plus such initiatives are actually wonderful opportunities for our multimodal self-supervising on it.

[0:14:11.4] HC: With that dataset, it sounds like you already have plans for a multimodal foundation model in your future.

[0:14:17.2] JBS: Yes, something like this, yes.

[0:14:19.7] HC: Are there any lessons you’ve learned in developing self-supervised approaches that could be applied more broadly to other types of imagery?

[0:14:26.4] JBS: Yes, indeed. So, I think some of the lessons can be generalized or can equal other types of imagery. So, for instance, I think that a lot of experiments are needed to study for instance, the effect of the declaration, the backbone architecture, and there’s some hyperparameters on the pre-training and I think that’s true also for other types of imagery and in particular, I am thinking about development in self-supervised learning for radiology data.

And I think that there are a lot of ideas that are in common between what we do at Owkin in digital pathology on what our people do for 2D or 3D radiology data.

[0:15:05.2] HC: And finally, where do you see the impact of Owkin in three to five years?

[0:15:08.5] JBS: I think hopefully that in three to five years, Owkin will have continued to advance clues for our cancer patients and discover actionable targets and specific subtypes of cancer hopefully and I am sure that Owkin will continue to benefit from advances in self-supervised learning and apply this to digital pathology. As I’ve said earlier, we’ve open-sourced our Phikon models, or our big base model that is a pre-trained on several concern indications from TCGN.

Hopefully, in the near future, we’ll open source all the models with the ID too, to help also the community.

[0:15:41.8] HC: This has been great, Jean-Baptiste. I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[0:15:49.5] JBS: So, I have a LinkedIn page as well as a Twitter page and I usually post some news, not on a regular basis but from time to time on my LinkedIn page. So, LinkedIn I’d say is a good place.

[0:16:01.7] HC: I will include a link to that in the show notes as well as a link to Owkin’s website. I believe that’s

[0:16:08.7] JBS: Exactly.

[0:16:09.3] HC: Thanks for joining me today.

[0:16:10.6] JBS: Thank you very much, Heather.

[0:16:12.3] HC: All right everyone, thanks for listening. I’m Heather Couture and I hope you join me again next time for Impact AI.


[0:16:22.0] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend. And if you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at