What role can artificial intelligence play in detecting breast cancer earlier, when it’s most treatable? In this episode of Impact AI, we hear from Nico Karssemeijer, Chief Science Officer of ScreenPoint Medical, about how his team is using AI to transform breast cancer screening. Drawing on more than four decades of experience in medical imaging, Nico shares how ScreenPoint’s AI tools assist radiologists by analyzing mammograms, highlighting suspicious areas, and even learning from years of patient data. The conversation explores what it takes to build trustworthy medical AI, overcome challenges with data diversity and device bias, and the importance of clinical validation. To find out how AI is being integrated into real-world healthcare to improve outcomes (and what goes into building a successful AI-powered medical company), tune in today!
Key Points:
- What led Nico to turn decades of research into a breast imaging AI startup.
- How ScreenPoint uses AI to support radiologists in early detection.
- Challenges of working with diverse data from different imaging devices.
- The importance of training models with clean, representative data.
- Strategies for reducing bias across vendors and populations.
- How independent, real-world validation drives trust and clinical adoption.
- Finding a balance between model accuracy and explainability.
- Why domain expertise is crucial for building a successful AI-powered startup.
- Driving adoption in medical AI through clinical partnerships and rigorous trials.
Quotes:
“It’s amazing how much more information you can get out of the mammograms [using AI]. That surprises me all the time.” — Nico Karssemeijer
“You can't just say, ‘This mammogram is abnormal,’ because then [the radiologists] are puzzled. – The algorithm is getting so good that it identifies areas the radiologists would probably not see by themselves. – You have to – mark the area in the exam where a lesion is found.” — Nico Karssemeijer
“It's incredibly important to have enough domain expertise when you start a company, because it's easy to fail because you don't understand well enough what the customer wants [or] where the field is going.” — Nico Karssemeijer
Links:
Nico Karssemeijer
ScreenPoint Medical
Nico Karssemeijer on LinkedIn
Nico Karssemeijer on Google Scholar
LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.
[INTRODUCTION]
[00:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine-learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research and computer vision for people in planetary health. You can sign up at pixelscientia.com/newsletter.
[INTERVIEW]
[0:00:34] HC: Today, I’m joined by guest Nico Karssemeijer, Chief Science Officer of ScreenPoint Medical, to talk about breast cancer screening. Nico, welcome to the show.
[0:00:43] NK: Yeah, thank you.
[0:00:45] HC: Nico, could you share a bit about your background and how that led you to ScreenPoint Medical?
[0:00:49] NK: Yes, my background is physics, I’m a physicist. Trained in the Netherlands, and I’ve moved into medical imaging quite early in my career. I did a PhD in medical image analysis at Rodboud University here in Nijmegen, the Netherlands. I’ve been working there in the radiology department for about 40 years. Developed a group in machine learning with various applications, but the major application was breast imaging.
I spent 40 years as a full professor until my retirement a couple of years ago. But in 2014, I started ScreenPoint as a spin-off of the company with some students that went along with me and that came quite successful. I gradually moved over to the company and was the CEO of the company for the first eight years. I’m still with the company as a consultant and helping them to guide and in the development of AI for breast screening.
[0:01:42] HC: What does ScreenPoint Medical do and why is it important for healthcare?
[0:01:46] NK: Yes, we develop applications to support radiologists with early detection of breast cancer, diagnosis of breast cancer. The current focus is early detection. We support radiologists with our AI application to help find cancers in mammograms earlier. That’s been quite successful. We have commercialized a product after a couple of years of development, and we have a large presence in the US and in Europe.
[0:02:13] HC: Tell me more about the AI part of this. How do you use AI to screen mammograms?
[0:02:19] NK: Yeah. Basically, a mammogram usually consists of four images, two images of each breast. As you probably know, breasts are compressed. It’s an X-ray image. These are 2D images, but increasingly also acquired in three dimensions. These are pretty high data volumes, and these images have a high resolution. And the radiologists during the screening process, yeah, they have to find early cancers, which is one of the most challenging tasks in radiology. It’s usually done by specialized radiologists, and they can spend a lot of time on screening because it’s high volume, it’s tiring, especially if you work with 3D, as is common these days in the US. It’s an application that one of the earliest that scientists like myself went into, where we automate that process and help radiologists to find cancer early, first by pointing out areas of concern, but also classifying them, and so on. That’s where we use AI.
[0:03:20] HC: How does the solution integrate into a radiologist’s daily job? Is it, like you said, highlighting a region that they should look at? How does it affect their regular job?
[0:03:31] NK: It indeed highlights areas that may be suspicious. It also ranks exams because radiologists typically read screening mammograms in batches. It’s not always, but that’s very common. They can maybe read a batch of 100 or 200 screening mammograms. We can rank those by looking at what’s suspicious in the image. So they can use that during the reading, sorting these cases in the work list, which supports them with getting better outcomes.
It’s basically highlighting first areas that they should definitely look at. And with the classifications that we provide, it also helps them to make a decision whether they should follow up in a certain case or not. This is in the screening process the most difficult. When is something concerning enough to follow up with doing additional imaging, calling a woman back for additional tests, basically.
[0:04:29] HC: In order to train a machine learning model in order to support you on this, how do you gather and annotate data?
[0:04:36] NK: Yeah. Gathering data is, in this case, not incredibly complicated, because there are so many mammograms made throughout screening centers throughout the world. It’s only in the US alone, there are about 40 million screening mammograms per year. It’s a matter of establishing good collaborations with clinical partners who are willing to share data. We sometimes also buy data.
The challenge is to make sure that the data that we collect is diverse enough in terms of especially different modality types. There are many vendors that sell mammography units, and these mammography units also are not identical. There are different models. Different software is implemented. It’s a field that’s always changing, devices are improved. We need to make sure that or data set that we use to train and also validate keeps up to date.
Practices are also different in Europe and the US, South America. We have to make sure that it’s representative. That’s the challenging part. And we do that, of course, by working with many different partners who have multiple sources of data. Then the annotation process, it’s very important as the application gets better to make as accurate as possible so that we don’t have much label noise.
Since radiologists often overlook breast cancer, and they may then find it one screening later, one year towards three years later. It’s, for us, very important when we collect data that we’re sure that they’re really normal by checking if there’s a follow-up screening exam to make sure that the data that we use for training without cancer that they are without cancer That’s difficult because sometimes cancers develop very slowly, so they can go undetected for quite a while. But by making sure we have long enough follow-up, we take care that we have clean normals without subtle signs of cancer that we don’t know about.
And then for the abnormal, it’s actually not so difficult because breast cancer is always – when there’s a patient with breast cancer, there’s always a report that has been a biopsy. We know where it is. We can annotate a region where the cancer is, what are the characteristics, and so on.
[0:06:53] HC: The abnormal cases are always validated by biopsy, but then you have to go back and figure where on the mammogram deletion was, is that right?
[0:07:02] NK: Yes, yes, that’s right. And if you have the report, and you have the biopsy report, and you understand mammograms, it’s not so complicated. We have people in-house that do the early phase. Most of the abnormalities we can – they have been trained so they can – with the report, they can identify and mark the areas. And in difficult cases, we have experts, we have radiologists to help us out.
[0:07:26] HC: What kinds of challenges do you encounter working with and training models based off of mammograms?
[0:07:31] NK: Challenges are, I think, initially, once you have the data labeled, which is the more time-consuming part, it is quite straightforward to build a model that works reasonably well. But as you go further, the incremental improvements becoming more and more important, because there is, of course, competition in this field, which is quite strong.
It’s also a fact that many radiologists are very good. To do things better than the radiologist in itself is already a challenge. We are now past that point, so that on average our application works better than radiologists, and it’s also needed for the adoption. But we still keep improving it because it’s amazing how much more information you can get out of the mammograms. That surprises me all the time. So that we can keep on improving it, and I think the end is not in site. That is also because, with these enormous databases, we have collected with all the previous mammograms that we are also using of breast cancer patients, so that we don’t have the examples of the cancers when the radiologist detected them, but we also have the whole history.
And then we can use those images also for training to find out if we couldn’t detect these cancers one, two or three years earlier. And that’s quite an advantage because we are booking a screening application that we have access to those images of the cancers when they are developing, but not yet detected. We have a huge opportunity there to improve. And that’s what’s happening right now.
I think one of the big challenges that we have is, as you already mentioned, the devices that are in the market, they are changing. They’re upgraded with new software, better algorithms for the image processing. There’s a huge variety of mammograms in the field. Also, the whole development to 3D mammography, where you actually have a stack of slices instead of a 2D image. There’s quite a variety in acquisition devices, depending on the physics of the machine. You can do this with a narrow angle or a wide angle, different type of detectors, and then the images are processed to make them visually – enhance them visually for the radiologists so that huge variety of data that we have in the field can confuse the algorithm if you’re not training it in a good manner, that you can easily create biases towards one vendor or towards one type of processing.
We do a lot of tricks to make sure that these biases are minimized in our application. It’s not so much that the cancers are not detected, but these biases are causing the scoring system that we are using. We provide scoring per mammogram, which is kind of a probability, whether there’s cancer present in a certain region. We want these scores to be calibrated. You can see them as risk factors, and they should not vary across manufacturers. If you would image the same woman with three devices, ideally we would get the same number out, and that is quite challenging.
[0:10:37] HC: How do you handle this diversity across different devices? Can you train a single universal model, or do you need specialized models for a particular manufacturer, for example?
[0:10:48] NK: No, we don’t. We train one model. It’s very important to always balance everything as much as possible in the training. We use augmentations that we create ourselves to mimic the variations that we see in the field. And by doing this, we can create more balanced trainings. There’s also a lot of image processing and technology that we have developed to ensure that we can do these balanced trainings.
[0:11:11] HC: Tell me more about the bias factor that you mentioned here. Do you mainly see potential bias with respect to the imaging device, or does it come up with respect to other factors, for example, racial bias?
[0:11:25] NK: Racial bias is often – I think a lot of people are afraid that there is racial bias. We don’t see that much, actually, and we think that the biases are much more in the different vendors, the different types of machines than racially. But still, we have to be aware of it. And so far, we haven’t seen much of that. But as the accuracy and also the precision of the algorithms is increasing, I think, of course, we need to take that into account.
It’s not so much that the cancers look different for women of different ethnic backgrounds, but there’s definitely data that shows that the distribution of the various types of cancer can be different in different ethnic subgroups. Potentially, that would create a bias. Once we know these differences in distribution, we could actually take that into account in the algorithm, once we know the ethnicity of the patients. But that is still – I don’t think any company is at the moment doing that. But I believe that in the future that may come, especially because applications are also now being developed to determine the risk of an individual to develop breast cancer, to replace other risk assessment tests, including genetic tests, because a lot of research has been done and it’s still being done to see where from these images. Both mammograms, but also pathology images, is actually what you see is a lot of information that is related to the genotype is represented in these images. And we can also extract risk factors that are very powerful.
And so once you get into that field, then it’s really producing numbers that tells something about the absolute risk that an individual woman may develop breast cancer within the next five to ten years. That’s a number, and that number, when we compute such numbers, we can take ethnicity into account once we have the data. But in reality, ethnicity is not always known. In the US, it’s fairly common to record that. But in Europe, it’s typically not done. This is still an open field where a lot of work can be done.
[0:13:34] HC: It sounds like it’s really about understanding the heterogeneity of the image acquisition procedures and devices, as well as the heterogeneity in biology, and genetics, and so on, in order to figure out whether there’s areas that need to capture diversity in your data set or accommodate with extra image augmentation for different devices, things like that.
[0:13:56] NK: Yeah.
[0:13:57] HC: How do you validate your machine learning models?
[0:13:59] NK: Of course, we have separate data sets. We have validation sets that we use during the training. But in the company, we have a special – we have a team that also takes care of the incoming data. They keep part of the new data apart, and it’s refreshed over time, of course, because we want to test the data that we use to test how will it perform in the field. That has to be representative also for the newest devices. So we make sure that we have a diverse test set that’s representative for what’s currently in the market. And that’s never seen. The R&D team doesn’t even have access to that data.
Once there’s a new release planned for the algorithm, we typically release a new algorithm with them every year, then it goes to the validation team, and they do a lot of standalone performance tests for the whole variety of vendors. Also, we look at ethnicity to the extent where we have that data. Check for biases, stability. That’s a very long standalone performance report that comes out that looks at cancer detection of different types of cancers, also the different features.
Our system also uses prior mammograms because we make comparison with prior mammograms in order to detect cancer in a new mammogram. That’s what radiologists also do. All these modules that we use are also to some extent individually tested. That’s a big report, and we have to do that for our own confidence that the device works when we install it, when we start to ship it to customers. But of course, also for regulatory purposes, we need to submit that data to the FDA for review before we can release a new algorithm.
[0:15:43] HC: How do you think about the balance between model accuracy and explainability? For radiologists, is it important to understand why a model predicts the way it does?
[0:15:53] NK: Not really how it does it, but they want to have explainability to some extent. So you can’t just say this mammogram, this is abnormal, because then they’re puzzled, because the algorithm is getting so good that it also identifies areas that the radiologists actually would probably not see by themselves. That happens. So you have to at least mark the area in the exam where a lesion is found. That’s the minimum. You cannot just say an exam is abnormal.
The explainability is providing a location in the first place of the abnormalities. And the second place, what we also do is we describe the abnormalities. You have soft tissue lesions, you have calcified areas, and there are a couple of other types of abnormalities also. So that the radiologist is informed what the algorithm did see in terms of their own language. I don’t think the radiologists are really interested in how the algorithm works, but they want to know, for instance, what kind of input it uses.
We have an application that also can include previous mammograms in the analysis, so that it not only looks at the new example, but also the history of the patient. And when we do that, they want to know that. What kind of information did you use in addition to the exam to come to this conclusion? That’s the type of explainability that we know our customers are interested in.
[0:17:21] HC: Is there any advice you could offer to other leaders of AI-powered startups?
[0:17:25] NK: Yeah, I think it’s quite important when you do a startup to check for yourself, “Okay, what’s really needed to make this into a success?” I really believe in teamwork. I think it’s important to see, “Okay, do we have all the knowledge and experience, expertise in-house to do this?” And when you don’t have that, look for someone that can help.
I’ve been involved in a number of startups in the past. I’m a founder of Volpara, which is now became quite a big company, has been acquired by Lunit, company in South Korea. With that experience, it’s easier, of course, to do a successful startup because the first time is always more difficult. But also, when you don’t have that or you know that you miss certain capacities, look for people that can help. There’s a co-founder, consultant, or whatever. And then the other thing is it’s incredibly important to have enough domain expertise when you start a company, because it’s easy to fail because you don’t understand well enough what the customer wants, where the field is going. I think these are common pitfalls.
[0:18:31] HC: And finally, where do you see the impact of ScreenPoint Medical in three to five years?
[0:18:37] NK: I think the impact, we see already quite a big impact. I think maybe I didn’t mention that. I should mention that when you asked about the validation. The most important for us is really the clinical validation. From the start, we have been working with clinical partners. We have been able to work with the best clinical partners in the field, both in Europe and in the US, the leaders in the field. And they are interested in early adopters in new technology. And they have been helping us to shape the application, give comments, critical comments in the early phases. But also, subsequently, they have done magnificent studies when they started to work with the system in their practices and demonstrating effectiveness.
I think the biggest success so far is the randomized control trial which was done with our application, and it was done in Sweden in Malmö by Kristina Lång, published in the Lancet. And that was a major success for the company. But also, there was a first randomized control trial that was done in this field, at least in the breast imaging AI. And that helped convince a lot of radiologists, “Okay, this is now time to start to adopt this technique.”
I think definitely an advice is, if you’re working in the medical field, to look for the best clinicians that can help you, especially with providing comments in the early phases, but also do the validation.
[0:20:03] HC: This has been great, Nico. I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?
[0:20:10] NK: Yeah, just go to the ScreenPoint website. I’m not very active on LinkedIn, but I’ve got a couple of hundred scientific publications, so I think people can find a lot of my background and what I’ve been doing in the past when they just search on the web.
[0:20:26] HC: Perfect. Thanks for joining me today.
[0:20:27] NK: Yeah, you’re welcome.
[0:20:29] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I hope you join me again next time for Impact AI.
[OUTRO]
[0:20:39] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend. If you’d like to learn more about computer vision applications for people and planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.
[END]