What if AI could unlock the potential of healthcare’s vast, unstructured data? In this episode, Tim O'Connell, Co-Founder and CEO of Emtelligent, explains how his company is bridging the gap between messy medical data and usable insights with AI-powered solutions. Drawing from his background in both engineering and radiology, Tim discusses how he saw firsthand the inefficiencies caused by disorganized medical notes and reports, which led to the creation of Emtelligent. He breaks down how their AI models work to process and structure this data, making it usable for healthcare professionals, researchers, and beyond. Tim also dives into the technical challenges, from handling faxed medical records to ensuring high levels of precision and recall in model training. Beyond the technology, he emphasizes the importance of safety, ethical use, and how Emtelligent continues to adapt its AI to meet the evolving needs of the healthcare industry, helping to make patient care more efficient and accurate. Don’t miss out on this important conversation with Tim O’Connell from Emtelligent!


Key Points:
  • An overview of Tim’s background in engineering and radiology.
  • How Tim co-founded Emtelligent to solve pressing data issues in healthcare.
  • The importance of turning unstructured medical text into searchable, structured data.
  • How Emtelligent’s models extract metadata and structure from faxed patient records.
  • Why healthcare data is so challenging to work with, from shorthand to messy notes.
  • The role of precision and recall in assessing and improving model performance in healthcare.
  • Ensuring AI models continue to perform well after deployment with ongoing updates.
  • How Tim’s team maintains safety and ethical standards in AI healthcare solutions.
  • Creating technology that serves the end user; how it is informed by firsthand experience.
  • The importance of clinical input to develop relevant and practical AI healthcare tools.
  • Where Tim sees AI's impact in healthcare evolving over the next three to five years.

Quotes:

“During that year [that I was] working in the hospital, – I saw so many problems that we have in the healthcare environment and realized that quite a few of them had to do with the fact [that] we deal with so much unstructured data.” — Tim O’Connell

“Every time a human goes to see a caregiver, some kind of an unstructured text note is generated – We really can't use a lot of that data, unless it's another human who's reading that data.” — Tim O’Connell

“I’m still a practicing radiologist. – It’s not just a matter of intelligent people coming up with good ideas and going, ‘Oh, well. [Let’s throw this] against the wall and see what sticks’. We're developing solutions that are applicable in today's healthcare environment.” — Tim O’Connell


Links:

Tim O’Connell on LinkedIn
Emtelligent


Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.


Transcript:

[INTRO]

[00:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven machine learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people in planetary health. You can sign up at pixelscientia.com/newsletter.

[INTERVIEW]

[00:00:34] HC: Today, I’m joined by guest, Tim O’Connell, Co-Founder and CEO of emtelligent, to talk about structuring healthcare data. Tim, welcome to the show.

[00:00:42] TO: Thank you, Heather. It’s a pleasure to be here.

[00:00:44] HC: Tim, could you share a bit about your background and how that led you to great emtelligent?

[00:00:48] TO: For sure. I used to work as a network engineer. I’ve got a master’s degree in engineering. After doing that for a few years, I wanted to do something else, so I ended up going into medicine and ended up doing a residency program in radiology. During that year, you know you’re working in the hospital, and I saw so many problems that we have in the healthcare environment and realized that quite a few of them had to do with the fact we deal with so much unstructured data, right?

Every time a human goes to see a caregiver, some kind of an unstructured text note is generated like, “Tim came to see me today with a three-week history of right knee pain,” that sort of thing. We really can’t use a lot of that data, unless it’s another human who’s reading that data. It can’t sort of be searched and sorted and summarized by machines very well and reliably, at least at that time.

I wound up meeting a wonderful guy, our chief technology officer. After finishing my training, he and I started doing some work together. Then with my brother and another gentleman, we founded emtelligent and to solve some of these problems that we see in healthcare which are caused by our dual need in healthcare. One is we have to generate these unstructured text notes when we see patients. Then the other need is we need to improve the efficiency, quality, and safety of our healthcare system. That’s – emtelligent tries to solve the problem that is the nexus of those two issues.

[00:02:21] HC: How do you tackle these? How does emtelligent solve those problems?

[00:02:25] TO: Sure. We make artificial intelligence software. It reads and understands those unstructured text notes. It does much more than that. It’s a long conversation to go into it. But by taking that unstructured data and structuring it and coding the data, understanding whether diseases are present or absent, what measurements were in the report, all this very detailed information, you can break these medical reports down into highly structured information and then use that with additional technologies like generative AI and this sort of thing for many, many use cases throughout the healthcare spectrum.

[00:03:02] HC: What type of machine learning models do you train to solve this? I don’t mean specific architectures and technical stuff. I mean you’ve talked a lot about the inputs, but what is the output? What do you train these models to achieve?

[00:03:14] TO: Sure. Essentially, what we do is we build custom data processing pipelines for our customers. In order to do that, we’ve had to build our own suite of technologies internally. One of our products is to deal with the problem we see in data interchange. There’s a ton of faxes flying around the healthcare system, and it’s pretty common for people to get a thousand-page facts of patient records.

One of our models or platforms will take that fax and break it up into subdocuments and extract metadata from those documents. We’ve applied for a patent for some new OCR features for issues we see in healthcare data specifically. That’s the data preparation step, if you will, right, to convert from unreadable bitmap faxes that are usually quite dirty and have handwriting on them and this sort of thing.

To prep the data for cleaning, the next step or the next product we have is what’s called in emtelliPro, and that’s our medical coding engine. That really does a lot of the extensive data structuring, right? You feed it a text document coming out of this OCR and PDF splitting first step. It converts that into very highly structured data.

Then the next product or step often in pipelines is to use our large language model to then basically do the complex tasks that people have for that data. That could be like criteria matching, as in the case of prior authorization. It could be document summarization or patient history summarization. Or it could be dealing with very complex messy data on the input side. There’s just so many possibilities in use cases for what we do.

[00:04:53] HC: How do you go about gathering data in order to train these types of models?

[00:04:56] TO: Well, there’s a lot of publicly available data out there. We have both a mix of public and private data. We have some data use agreements with some of our customers, and that enables us to be able to really train the models on a broad variety of healthcare data.

[00:05:11] HC: What are some of the challenges your team encounters in working with medical data and then training models based off of it?

[00:05:18] TO: It is all a challenge. One group of challenges is locality of medical data. What I mean by that is different institutions like hospitals. People will use different terms for different diseases. You’ll process some documents, and there’ll be these strange abbreviations that you’ve never seen before. So you need to add those in and retrain your models. That’s one challenge.

Another challenge is just the dirtiness of healthcare data. When caregivers are trained, they’re not trained to make tidy notes, so they use a lot of shorthand, the notes. Different people create lists in the middle of their notes in different ways. Some people will say, “Deny shortness of breath.” Other people will say, “Shortness of breath denies,” right?

There’s all kinds of things like that that can confuse language models, in addition to just even dealing with fax data, where they’ll be white out on the scanner, and it will leave lines all over the fax. Or the page will be crooked. Things like that that can really cause problems on even just the simplest like the OCR, the first task in the understanding chain. Yes, I think everything is a challenge in dealing with healthcare data.

[00:06:25] HC: There’s a variety of different ways you could assess how well a model is performing, the quality of its outputs. How do you approach that with your models for these healthcare purposes?

[00:06:35] TO: Oh, yes, absolutely. What evaluation in language understanding is a really well-known field. People do their PhDs in this. Two of the metrics used are precision and recall. Recall is sort of like if there was something in the text document or the source document that you were looking for, did you find it, right? Then precision is were you right in your understanding of it.

One way to visualize precision and recall is sort of like either ends of a balance or a scale, right? You can tune models for very high recall, but you’re going to be wrong more often. Or you can tune them for very high precision, and you’ll be right all the time, but you won’t have found as many terms. Whenever we’re developing models, number one, there is taking the data, running it through the models, grading it on precision and recall. Number two is really making sure that we’re answering the right questions.

Particularly now, as we’re moving into using generative AI and large language models, we see all these people touting, “Oh, the large language model can pass this medical school examination,” right? That’s just because it’s been trained on the books that are used to generate the questions for those examinations. It doesn’t mean they’re equivalent to doctors. As any physician will tell you, it’s only after you finish med school and you write those licensing exams that the real learning begins.

It’s continually a challenge to come up with questions and benchmarks, things like that to really assess the true performance of these systems and particularly now that generative AI has entered the field. We use a combination of things like precision and recall. We’ve curated our own benchmark data set. We’re continuing to work and are building out a new benchmark data set specifically for large language model usage. There’s nothing we do that isn’t – no development we do that isn’t tied to an evaluation step.

[00:08:29] HC: Structuring that benchmark data set and gathering data for it, how do you think about other characteristics like generalizability and things that really tell you how the model will perform in a real-world use case?

[00:08:42] TO: Yes. That’s another thing that we do. Part of our engagement process with customers is always to do usually a proof of concept and often a pilot. Proof of concept will use the identified data through the customer’s data or our data, small amount of data. We’re really just doing that to go is there a possibility here, right? Is there a hint of truth that this is going to work?

Then the pilot is usually using identified data from our customers, much larger numbers of documents. There’s usually extensive evaluation done during that step. For every use case, we’re ensuring that an extensive evaluation is done, so both us and the customer can understand the characteristics of the system, understand its safety characteristics. Make sure that if it goes into production, we know what is reliable and what is unreliable, all that sort of stuff. It’s sort of like how you build a system.

[00:09:32] HC: One of the challenges with AI models is that once you validate them and deploy them, you’re not exactly done. Things can still change after they’re deployed. COVID is a great example where things changed, and your model may no longer work. What do you do to ensure that models continue to work and continue to perform well over time?

[00:09:52] TO: Great question. It’s a continuous process. For example, in order to do medical coding, you need to have medical coding databases, right? We use SNOMED, ICD-10, RxNorm, things like this. We’re continually updating those ontologies. Those ontologies are usually released multiple times per year by their parent organizations, and so we build those ontologies into our product.

For example, a great example with COVID, COVID didn’t exist before 2019. It wasn’t a term, so SNOMED had to come up with new concepts and concept IDs for things related to COVID, COVID vaccination, and this sort of thing. That’s all baked into our development process is to continually update these coding ontologies every multiple times per year, as we do software releases per year. Then make sure that our customers are rolling these out.

The other way that we work to ensure performance over time is for systems that we deploy, we work with our customers to develop benchmarks for those systems to see if drift is occurring either in their data or in new models and this sort of thing. We also – it’s part of our development process of thousands and thousands of regression tests so that every new model that we build, just because it scores one percent higher in precision or recall doesn’t mean it’s forgotten about something important. It’s really an end-to-end approach to ensuring that the models stay current and are staying safe to use.

[00:11:22] HC: We’ve talked a fair amount about validating that models make accurate predictions. But what about other characteristics like safety and ethical use? Are there extra checks and balances that need to be in place for that?

[00:11:33] TO: Sure. I mean, I think we build that in as part of the customer engagement process, right? We don’t generally work with customers who aren’t interested in telling us about how they want to use our software. When it comes to safety and ethical use, a lot of that is the safety has to be baked in. These system characteristics understood as you’re building these models and building these platforms. With regards to ethical use, we make sure we understand the customer’s use case.

Now, if some customer wants to take your software once they have it and start doing unethical things with that, there’s not particularly much you can do to stop them if they don’t tell you. But we’ve never encountered that situation before. We really – because there’s in most cases customization required to deploy our software, we really have to understand the customer’s business case and use cases before letting things up the door. It isn’t just a, “Oh, sign up here,” and you can start denying insurance claims to people.

[00:12:22] HC: How do you go about understanding the customer use case, not just one customer but multiple customers, to be sure that the technology you’re developing will fit in with the workflow that’s needed to assist the end user?

[00:12:34] TO: Yes. Again, that comes – I mean, I think part of that is our experience. We’ve got a great number of data scientists and physicians and other people with healthcare backgrounds on our staff. Part of ensuring that our technology fits in with the healthcare window is based on our own experience, right? I’m still a practicing radiologist. I work clinically one to two days a week. It’s not just a matter of emtelligent people coming up with good ideas and going, “Oh, well. It’s against the wall,” and see what sticks, right? We’re developing solutions that are applicable in today’s healthcare environment.

As part of our customer engagement process, really the first and often the longest conversation is like what do you want to use this technology for to do in your workflow, and what is your own internal business case for this. There’s a lot of things that AI and things like natural language processing can do which are interesting party tricks or what we call science fair experiments. But they don’t necessarily make sense from a business perspective or patient care improvement or system efficiency perspective.

We’re using our experience in engaging with customers to make sure that things always make sense. Otherwise, you just end up burning a lot of time, right? There’s a lot of people out there who are like, “Oh, wow. Wouldn’t it be neat if we could?” We try to stay away from those engagements and focus more on like, “We have a business problem or a quality problem or an efficiency problem. We’d like to work with you improve it.”

[00:14:00] HC: Then internally, I imagine it’s a collaboration between the clinical expertise that you have combined with the machine learning knowledge and everything in between in order to be sure that the models you develop are actually useful for the thing that you’re trying to solve.

[00:14:16] TO: Oh, 100%. That’s always been our approach, right? Anoop, who is our CTO, and I have always worked together from what we call a right brain, left brain approach, right? I think I’ve looked at a number of other solutions out there in the marketplace, including many for healthcare. I can immediately tell this was developed by a machine learning person with very little interest for or very little input from a caregiver, right? They don’t understand the workflow. They haven’t thought about the safety problems this is going to cause, things like that. It’s very easy to pick that up. You’re like, “No physician would have looked at the output from this model and said, ‘Yes, I want to implement that.’” That looks –

We’ve always at emtelligent looked at development as really a process that involves two things. It’s very much a binary process in that perspective. You have to have – very few physicians have PhDs in computer science and machine learning to the extent that they know everything they need to know about developing models. Very few computer scientists with PhD and machine learning and artificial intelligence also have medical degrees, so you really need input from both sides to ensure you’re building a quality product.

[00:15:27] HC: Is there any advice you could offer to other leaders of AI-powered startups?

[00:15:31] TO: I think the key thing is on our – as you graduate from medical schools and have the Hippocratic Oath ceremony, and that’s, first, do no harm, I think you need to keep that in mind. Firstly, to be like, “What are we doing here? Is it going to be safe? Is it going to be ethical? Is it our customers? Is there a risk of someone with good intentions doing bad things with this?”

The other thing you have to really focus on is, again, that clinical input. If you don’t have clinical staff providing input to your product at every stage, then you’re really going to have a hard time building products that will be useful and will be accepted in the healthcare environment.

[00:16:10] HC: Finally, where do you see the impact of emtelligent in three to five years?

[00:16:14] TO: I mean, that is a very hard question because it feels like things are moving faster now than they even were moving three years ago, particularly with large language models on the scene. The impact I’m hoping for in three to five years really is very broad deployment throughout various verticals within the healthcare industry and helping patients, right? Helping patients achieve better access to higher quality healthcare at a better price. That’s safer and closing our gaps in healthcare so that fewer and fewer patients slip through the cracks or lost to follow-up. Fewer and fewer incidents where the “holes in the cheese lineup,” and bad things happen to patients. That’s where I see the impact of our software in three to five years.

[00:16:59] HC: This has been great, Tim. I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[00:17:06] TO: Thanks, Heather. They can find more about us online at our website, emtelligent.com. We also have a trial where people can just quickly sign up and test out emtelliPro, one of our software components. Yes, they can just reach out to us or email us at [email protected], and we’d be more than happy to chat with.

[00:17:21] HC: Perfect. Thanks for joining me today.

[00:17:23] TO: Thanks, Heather. It was my pleasure.

[00:17:25] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I hope you join me again next time for Impact AI.

[OUTRO]

[00:17:35] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend. If you’d like to learn more about computer vision applications for people and planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]