Detecting Gastrointestinal Cancers Earlier with Marcel Gehrung from Cyted

The role of AI in cancer detection grows more significant with each passing week. During this conversation, I welcome Marcel Gehrung, CEO and Co-Founder of Cyted, to discuss detecting gastrointestinal cancer. You’ll learn how Cyted leverages machine learning to diagnose Barrett’s Esophagus in upper GI samples. Marcel reveals some of the challenges he has faced at Cyted related to the limited autonomy an algorithm can realistically provide, and annotating data for training and validation. Hear how the company is responding to changes in AI, and why hiring for technical roles at Cyted has not been difficult, due to their location. You’ll hear Marcel’s perspective on hiring specialist generalists and some of his advice for leaders at AI-powered startups.

Key Points:

Introducing founder and CEO at Cyted, Marcel Gehrung.
His path to focusing on gastrointestinal cancer detection.
How the technology at Cyted works to diagnose Barrett’s Esophagus.
What Barrett’s is and who is most susceptible to it.
The role of machine learning in detecting cancer in upper GI samples.
Navigating the challenge of how much autonomy an algorithm can provide.
Annotating data for training and validation.
How Cyted is responding to changes in AI.
Hiring for technical roles at Cyted.
Onboarding challenges due to the verticality of technology Cyted works with.
Why Marcel advocates for hiring specialized generalists.
Marcel’s advice for leaders of AI-powered startups.
Where he sees the impact of Cyted in three to five years.

Quotes:

“We’re essentially leveraging the best of both worlds. We’re working with cytoscreeners, which we also have on our staff to generate the initial annotations, and then we have someone who looks at it and then reclassifies if necessary.” — Marcel Gehrung

“The more ability the candidates have to horizontally integrate different types of knowledge from across the company or across the technology of the sector, the better.” — Marcel Gehrung

“Getting carried away just happens so easily, particularly when we follow the various news outlets in the world that overwhelm us with new exciting ideas and functions of that technology.” — Marcel Gehrung

Links:

Marcel Gehrung on LinkedIn Marcel Gehrung on Twitter Cyted

Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.

Transcript:

[INTRODUCTION]

[00:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine-learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people in planetary health. You can sign up at pixelscientia.com/newsletter.

[INTERVIEW]

[0:00:34.3] HC: Today, I’m joined by guest Marcel Gehrung, cofounder and CEO of Cyted to talk about detecting gastrointestinal cancer. Marcel, welcome to the show.

[0:00:43.8] MG: Thanks, Heather, for having me.

[0:00:45.4] HC: Marcel could you share a bit about your background and how that led you to create Cyted?

[0:00:48.3] MG: Yeah, sure. So I started out, I would say as a fairly broad, generalist scientist with an undergrad in material sciences. So I’m not very healthcare related but by the medical physics route, got into medical imaging during my master’s and then brought another passion together with the world after that academia until then, which was computer science.

So until then, I always looked at problems in my academic background from a very scientific perspective. I focused on a lot of physics-related science in the context of medical imaging, but from my teenage years, I was always very passionate about computer science and tinkering around with programming. You know, I think I built my whole first home page around the age of 14 and always had some kind of passion and enthusiasm around that but never really figured out how to bring those two worlds together, yeah, for academic purposes.

I think bio-informatics existed at that time already, but because I was also interested in science and basic natural science during school. I never really made the leap in the early days to bring the two worlds together, but then I came to Cambridge in 2017, started a Ph.D. here working on machine learning applications to different types of medical imaging and that rearranged from anatomical imaging to MRI, CT, all the way over to [inaudible 0:02:05.7] pathology and particularly double downed on the different pathology element in that Ph.D. and started working on cytology samples from the upper GI tract to find early precursor petitions to esophageal cancer, which is one of the most lethal cancers there is in the Western world.

And it was quite interesting because I think very early on in my Ph.D., I was looking at fairly broad problems on applying machine learning on medical images, and some of those were also quite technical but then I think we’re going to get to that maybe a bit later; there’s always this big question on; it’s not really a question of if you find the best or even possible but also do you find the best application area in which you can solve a particular problem with an approach.

That’s always sort of the dissonance between trying to solve certain technical questions but also trying to solve a relevant clinical problem and in the early days of my Ph.D., I’ve actually met one of the core people that became a cofounder of Cyted at the end, who is Rebecca Fitzgerald, who is a professor for cancer prevention in Cambridge and she’s been working on cytology samples using a – collected using a capsule sponge from the upper GI tract to find all cells, all types of cellular abnormalities.

And as you can imagine, there’s the maturity of that technology at the time really gave a very, very open platform and a springboard to build the company in that space where we’re working on different types of computer vision applications on cytology data. So this was a slightly long-winded way of how I got from very broad natural science into starting a company in that space but at the time, the next episode is an obvious one. Maybe in hindsight, it seems a bit like making a couple of 90 degrees turns along the way.

[0:03:44.1] HC: So what does Cyted do and why is this important for cancer detection?

[0:03:46.6] MG: So we’re working on one very particular platform, which uses minimally invasive cell collection of the upper GI tract using an ingestible capsule and then taking the cells, which are collected using that capsule and applying different types of biomarkers on the tissue collectors and our primary application right now is for the diagnosis or using this technology for the diagnosis of Barrett’s Esophagus.

So Barrett’s is a precursor condition of esophageal cancer and has a population prevalence in the Western world of around one to one and a half percent. So it’s a pretty high-prevalence disease and particularly in patients that are suffering from long-term heartburn and reflux symptoms, and unlike in the lower GI tract where we’re fit testing and colonoscopies, we have a very clear pathway how people are getting into the intervention of endoscopy.

There’s a gap here for the upper GI tract because there’s nothing like a fit test for the upper GI tract. You’re either being considered for an endoscopy, which is, I think as we all know, fairly invasive and resource intensive but also pretty unfriendly for a patient and taking out quite a bit of time for the patient’s life, not only on the day which the procedure is performed but also the mental anxiety, which is built around procedure like this, which has, I think, we have invasiveness, and so what we have been doing basically sits in the care pathway just before that.

So patients which are suffering from long-term heartburn or reflux symptoms, which I think just in the US are around 50 million individuals, so, quite a substantial number. You can imagine that all of those 30 million people, you can subtract all of them to an endoscopy because we just simply don’t have the healthcare system, which is built for that purpose.

So what we’re doing is basically offering that first stage to look after these patients that we won’t miss the ones, which have any early precancerous disease or that already have early cancer or potentially even advance esophageal cancer, which is, as I mentioned earlier, one of the cancers with the highest mortality in the western world.

So all of the things we do basically are thinking about how do we give patients less invasive, more patient-friendly, and also cost-effective and value-based entry points into the upper GI care pathway for patients that are at risk of developing cancer. That’s the core part of what we do. We’re always doing quite some exciting work in the inflammatory diseases space currently but that’s more in the pipeline and more on this I think when it comes over the course of this year and next year, certainly.

[0:06:10.1] HC: What role does machine learning play in your technology? How does it help you detect cancer from these upper GI samples?

[0:06:16.9] MG: Yeah, so we started on addressing that question, right the thing in the early days when I was studying my Ph.D. thinking about the type of tissue we’re collecting and the way how that it’s being processed. There’s quite a lot of molecular, sort of liquid type of work we’re going right now in these samples but in its conventional way, the essay is performed by using the cells and treating them like a histology side of pathology samples.

So it’s not like a cytology smear. So it actually appears slightly different under the microscope on the scanner because the cells are preprocessed to form a cohesive clot, but one of the things we have been able to demonstrate quite successfully is how you can use different types of computer vision approaches to highlight the relevant areas or the relevant cells for pathologists to make it faster and more accurate diagnosis.

And I think as we all know and I think particularly the listeners of the podcast, there’s always this challenge of how much autonomy does an algorithm provide. Is it something that just draws a green box around an area that should be highlighted where the pathologist should focus their, focus their vision on? Is it something that already precomposes a report to some extent and then can be arrow checked by a pathologist, or is it in its third stage or third class, a fully autonomous system?

We have demonstrated the feasibility of, let’s for this purposes call it, a class two system. So something that can be certainly generate reports for a majority of all samples without missing any disease. Obviously, at risk of overcalling it slightly. So it always needs a pathologist to double-check it, but more recently, we have started to use computer vision and AI technologies on these images to become better in discovering new biomarkers for indications, which we’re currently in, which we’re currently working on.

So not necessarily Barrett’s in itself but also looking at morphology, looking at the presence of other cell types, which our pathologists wouldn’t really look for because they have a very biased eye to in our case, look for Barrett’s Esophagus. So we now have this massive real-world biobank and data bank of images, which we’re using for that purpose and have started to also go beyond using computer vision technologies and how we can interrogate the data. So it’s pretty exciting times and a very rich particularly real-world datas that we’re able to work on.

[0:08:28.7] HC: So is this based off of supervised machine learning algorithms and if so, how do annotate your data in order to be able to train and validate them?

[0:08:38.1] MG: Yeah, it certainly, and so I think we’re using technologies across the spectrum now. We have done some very interesting work with Microsoft over the last year on unsupervised approaches to solve similar problems than the ones we’re looking at, but practically, we’ve started, you know, as I just mentioned already, practically we’ve started with supervised approaches.

And one of the ways, one of the things we cha- , we thought was pretty challenging in the beginning, and I think it took us quite a while to figure that out, as you pointed out, how do we annotate that data and particularly for cytology data, which probably has some additional complexity in there opposed to a basic biopsy or any other histology data.

One of the things we haven’t been able to achieve is we’ve built a similar type of model for prescreening as people have seen with cervical cancer smearing. Cervical cancer smear, so we basically have no pathologists but here in the UK, you call them cytoscreeners, which we have been able to leverage for making annotations on some of the real-world data as we go along and those annotations are then being confirmed by pathologist, which in our case, overcomes one of the main bottlenecks, which is also the cost implications of using a full-time pathologist for these purposes.

So we’re essentially leveraging the best of both worlds. We’re working with cytoscreeners, which we also have on our staff to generate the initial annotations. and then have someone who looks at it and then reclassify if necessary but overall, I think our reclassification rates of these annotations are pretty low and pathologists can usually do that in a very expedited way and pretty swiftly to confirm the annotations, which have been to the other members of the team who are specifically trained on this type of sample.

[0:10:14.3] HC: What other kinds of challenges do you encounter in working with cytology images and with training machine learning algorithms based on them?

[0:10:21.4] MG: I think generally, on cytology images, and we played around with this in the past because we don’t use conventional cytology images, the set dimension is not as important for us because we clock the cells, and then we treat it as a histology sample. So if we make a few micron sections, which takes a set dimension essentially almost out of the equation. So for us, the biggest challenge for cytology images has been overcome by the way how the sample is being preprocessed, which is also the way how we’ve done it in the clinical trials for a long period of time now.

I think the other one is something which is – I don’t think often overlook but with pathology images, we always can assume that there is some kind of spatial dependency of the area that is next to the one you’re currently looking at because in tissue architecture, it’s inevitable that there is some kind of relationship between the lower layers, let’s say of a certain epithelium in the upper layers with certain epithelium but that spatial relationship is completely taken away.

Also in our case, because we’re obviously scraping cells from the upper GI tract and then they are being processed then and mixed up in a way. So I think it’s quite challenging for pathologists, for those people that work with them from a technical perspective to get used to how the sample is composed or what’s the composition of the sample particularly because it slightly appears to look like a biopsy but at the same time, neighboring cells in those samples might have not spatial relationship with one another where they originally came from.

So they’re not ones, they’re not two cells, which are side by side in the Esophagus, they just happen to be side by side because of the way that was processed. So it’s pretty important I think when annotations are being made that just because you see something in one area, it doesn’t mean you can find the same tissue again in a completely different corner of the sample again. As I said, just because there’s no spatial dependency between any individual area of the image.

[0:12:08.1] HC: So really, you focus on the local cell morphology and ignore any spatial relationships that you wouldn’t in histopathology but here, they would confuse the model if you try to model that.

[0:12:20.0] MG: Exactly. Particularly if you do any kind of micro environment-related work if you would want to look at immune cell infiltration. I mean, the chance that immune cell, which you’re seeing in the neighborhood of the cellular campaign, and that there’s any relationship or any reason why that immune cell is in that place, and you pointed it out already, is uncertain because that immune cell might have just been mixed in the sample and it landed there during embedding.

[0:12:42.4] HC: Machine learning is advancing quite rapidly right now. There are new advancements hitting the headlines more frequently than ever before. Are there any new developments in computer vision or AI more broadly that you’re particularly excited about and perhaps could see a potential use case for Cyted?

[0:12:58.2] MG: Yeah, thinking about those question, I think I should come up with a way how to avoid the word large language model, but I probably won’t get around doing that. I mean, like everyone else, we are fairly actively monitoring what’s going on in that space and maybe also have been trying to keep out of the hype cycle. At least for now, mostly in relationship to the fact that the maturity stage we have currently as a company, there’s quite a few distinct things for us to do in the pipeline ahead of us.

So it would be pretty easy to get sidetracked by something that might actually solve a proper unmet clinical need at the end of the day and actually then just causes more problems than it will solve in the near term future. So one of the things we’re actually interested in there, some cool stuff in the pipeline, also with some of the big players in the space, where we’re having a collaboration is on understanding how we can use all mixed data more generally and figure out a way to interrogate that using natural language, overcoming some of the more conventional analysis problems in this type of data.

I think there is quite a long way to go there because we are seeing some excellent progress on generating natural language in itself based on prompts that are natural language but I think one of the things that could be very interesting for our space is when do we get to a degree of multiple modality.

Where we can put it of working the data types on one end and then we have almost without knowing the actual mapping function that’s going on in the network and the architecture itself and we can then use some natural language to basically instruct and how to analyze that data and get something out that is in a different sort of modality on the output side of the network architecture of the architecture itself.

So we are working on it. As I said, I try to avoid saying anything about natural language processing. I think it is impossible to do that these days. I think potential use cases, to your question, I mean, there is almost an unlimited number of those used cases that properly focus on, in our case, building better diagnostics to either get patients or catch patients very early in their disease development or get them into the right treatment pathway.

I would say in the pipeline, making sure that they’re properly addressing clinical need is I think, top priority, number one for us. So I think we’re going to sit on that for a bit longer and observe what’s happening before we really try to confuse ourselves than to distract ourselves with too many tensions or sidetracks.

[0:15:19.2] HC: You’re definitely right that there is a lot going on there, you know, the multi-model space in particular, I think it needs to evolve a bit more before it can really affect the medical space because they’re still trying to figure it out in the language and images and different speech, different forms of data there. When you come into the medical space, generally there is less data to train on and so you have to figure out exactly how it’s going to work here and make sure that can be robust and all those important questions, so.

[0:15:47.4] MG: Yeah, exactly.

[0:15:48.6] HC: So hiring for machine learning can be quite challenging due to the high demand for professionals in this field. What approaches to recruiting and onboarding have been most successful for your team?

[0:15:58.4] MG: So we have the great geographic, I think, location, that we had wanted in Cambridge in the UK, where there’s a few very, very strong universities either next door or down the road essentially and for us, that has really helped being embedded in the ecosystem her. When we look for bide, from ethics challenges, there’s a couple of work in the institutes where they are 30 minutes commute from here, which always has interested graduate students or post-grad students that are looking to make the jump into industry.

I think we have not really struggled that much with technical roles here in Cambridge, but probably also because we’re slightly sheltered and isolated from some of the macro headwinds when it comes to recruitment challenges and what happened to tech in a broader sense but also a lot of biotech companies. That being said, we and not side-tracking the question too much. We struggle to hire in different areas, you know?

So if like scientists for us in the lab is always a big shortage. Tech, as I said, is something which we are quite fortunate with here in the UK, particularly in the sort of Cambridge-Oxford along the ecosystem, where there is always companies locally, and there is always a big supply – I mean, a big supply of candidates, basically coming out of academia that they’re looking to make the jump, which are pretty eager to explore and often want to say locally before trying to really maybe go for another road, which forces them to relocate.

So yeah, we saw some challenges, but maybe not in the areas where most people typically expected them.

[0:17:29.7] HC: That’s good to hear. It’s great to be near a big university when you have that steady stream of candidates. So for machine learning candidates that you bring on, have they worked with medical data before, or is there a learning curve, and you have to get them onboarded and adapted to that type of data?

[0:17:46.9] MG: So we always try to find the perfect mix of background in a candidate, a perfect mix without anyone trying to jump on my answer here is not ten years of experience in bioinformatics and has worked with all sorts of all mixed datas between genomic and pro-genomics and has done X, Y, and Z.

So I think one of the things, which is almost sufficient for us to be convinced that candidates are able to excel is through the interview stage, see whether they have an inclination and a passion to pick up some of the concepts around the data we’re working with pretty quickly and when that’s a given, there’s very little questions we need to ask to make sure that we’ve got an understanding for whether this individual candidate then gets up to speed really quickly. You know, there’s different sort of challenges or not really coding challenges but different interview challenges we’re giving to kind of that’s when we’re hiring for this role.

So far, I think that has been pretty successful. I think you’re hinting at one other point there, which is what happens if those candidates are recruited into a role at the end and then how they’re being onboarded. I think sometimes it’s pretty challenging for us to do that in the most streamlined way possible, mostly because the company spends such a verticality of the technology we’re working with.

As I said, from sort of basic RND over to divisive element to running a lab and doing all of it in the commercial environment can be pretty overwhelming when you join initially. So we’ve been trying to develop materials into our own workshops and getting up to speed like seminars and onboarding days for neutral and I think we run them once every month, to overcome that as much as possible and to focus their attention on the things that they actually need to know because it’s otherwise so easy to get sidetracked in the beginning.

But I think in our case, a mixture between sort of letting them immerse themselves, which obviously requires the basic character trait of them being curious and passionate about that, the new thing they’re just getting into, together with some pretty directed education about, you know, “Here, here are seven papers. You should certainly know the exact summary for and here is a code repository, which you should better spend a bit of time on and look at some of the examples in there to really get up to speed quickly.”

So I think a synthesis of those typical approaches is what we use and that also varies across the different roles. So whether someone is more in the research role or someone is more on an engineering-based role but I think we have developed a pretty good spectrum to make sure that people land in the right place and then can organically evolve from there.

[0:20:16.7] HC: One of the key parts that you mentioned there is when you’re screening candidates who have a good set of questions or evaluations that you can tell whether that person can adapt to the domain-specific nature or this problem and be able to work with your team.

[0:20:32.0] MG: Totally, and I am personally a big ambassador of giving the specialized generalist an opportunity for these types of roles, particularly when it’s in the early stage of the company because the more ability the candidates have to horizontally integrate different types of knowledge from across the company or across the technology of the sector, the better. You know I think when a company gets to several hundred employees, at some point there is an evitable need to specialize.

So the candidate profile asks to slightly shift and you probably want to hire more specialists over generalists at some point but I’m always eager and at the same point, slightly interested about when that point will come to us in the future because I think that we mean quite a bit of a change in mindset of what character you would be looking for at that point in time.

[0:21:14.4] HC: Is there any advice you could offer to other leaders of AI-powered startups?

[0:21:18.7] MG: I think it speaks to the point, I mean, you raised earlier on sort of new developments in computer vision and what we’re excited about in potential use cases as a result of those and generally very careful or cautious about jumping on sort of sledgehammer approaches to most problems and we may have seen a lot of them happening for us as a company that produces all sorts of different data in different corners of the business and tries to find some kind of analytics or data science-driven approach to get more insight into that data.

And having been brought up as a scientist and having gone through a Ph.D., I am always reminding myself the first principles of how to solve certain types of problems. So advice, I’m not sure whether that’s advice or whether that’s a word of caution, I think particularly in the world we’re in right now, solving, like building solutions that are looking for problems I think always is a minefield and we might actually enter into a new era of this problem in healthcare, which we’re not even yet aware of.

I think the amounts of companies working on natural language model and natural language processing at large language models in healthcare are, is probably the numbers growing quicker than ever before and I assume by the week. So I think for everyone that’s probably in companies that are a bit more established than – so a bit more established like we are, I think it is just making sure that when people want to adopt these types of technologies or when they want to consider any type of AI or in approaches for trying to maybe give their business model a bit of an edge over a competitor.

I think going very deep into the problem and understanding what the extra solution to it should be, can be and in a happy large language model world would be, is pretty important. It is too easy to get carried away by sort of flashy solutions for problems in a space like healthcare, where we know that adoption is going to take a long time. So the adoption of those technologies will outlast the hype cycle and whatever the next generation in that particular cycle of a large language months it’s going to be.

Whether that’s multi-modality, whether those other approaches is written in the stars for now often but maybe actually not, but it is something, which is a few years down the road now. That’s sort of the hybrid between advice and word of caution I would have. It is certainly the advice I give myself if that is any help because I found myself in a position too many times where getting carried away is – it just happens so easily, particularly when we follow the various news outlets in the world that overwhelm us with new exciting ideas and functions of that technology.

And making sure that we are sort of finding some way of homeostasis in there is pretty important, particularly as entrepreneurs because we all wake up with the intent of building new things and breaking things but that is not always the best way I think how to approach certain problems in healthcare and there is plenty of AI-powered companies in our space where I think we’re on deft fingers and maybe hands on that in the past. So yeah, I think those are my words on this. I hope there is some value married in them.

[0:24:11.9] HC: There definitely is some good value in there and finally, where do you see the impact of Cyted in three to five years?

[0:24:18.3] MG: Yes, so one of the things we have been establishing to really embrace is understanding how the data we have now generated in the real world can be used to push our technology into new indications, you know, moving to the next generation of the technology, which could be more stable and I think in three to five years, we have seen so many exciting developments also from a clinical guidelines perspective in the upper GI space over the last few years that I am pretty confident that in Europe, we will be the biggest player in the upper GI early cancer detection space and we’d be amongst the top three in the US.

Because this is a space, which is still pretty niche and there is a very, very small number of companies that are competing in this space, but we already know quite a few people that are eager to enter the space at the same time. So for us, it is all about finding the right balance between our sort of medical advice platform informative investigative collection and making sure that we strategically work on new biomarkers that make us go beyond detecting Barrett’s Esophagus also into the inflammatory diseases space.

I think one of the important things that may be note also into some of the cancers, which are not that much of a problem in the western world but they are in the Eastern world. So without doing a biology deed in others, just two main forms of esophageal cancer, one of them is a small problem in the Eastern world, another one in the western world. We’re also doing some work on the one, which is in the Eastern world, which is responsible for a much higher mortality across the globe.

But unfortunately, you know mostly prevalent in countries, which have a – worst in healthcare system I think in most Western countries but I think it’s also our responsibility to develop solutions for those countries that can be used in a cost-effective way. So I think three to five years, three years we’re going to be in the US and we’re going to become one of the big players in our space there and in five years’ time, I think I should be able to say one of the world leading players and not just one of the Western world leading players.

[0:26:08.3] HC: Great, I look forward to following you. This has been great, Marcel, your team at Cyted is doing some really interesting work for GI cancers. I expect that the insights you shared will be valuable to other AI companies. Where can people find out more about you online?

[0:26:21.6] MG: I am pretty open to just connecting with people. So people can find me on LinkedIn. People can find a lot more about the company by just probably Google, and going into news articles or our home page but yeah, if people want to have a chat about what we do and explore thoughts or opportunities, then I’m on LinkedIn and I am not restricted when it comes to accepting connections that are planning to message us.

[0:26:45.3] HC: Perfect. Thanks for joining me today.

[0:26:47.0] MG: Thank you. Thank you for having me.

[0:26:48.6] HC: All right everyone, thanks for listening. I’m Heather Couture and I hope you join me again next time for Impact AI.

[END OF INTERVIEW]

[0:26:59.9] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend. And if you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]