Unlocking Unstructured Health Data with David Sontag from Layer Health

What if we could unlock the hidden potential of unstructured health data to improve patient outcomes? In this episode, I sit down with David Sontag, co-founder and CEO of Layer Health, to discuss the transformative role of AI in healthcare. David, an MIT professor (on leave) and leading machine learning researcher, delves into how Layer Health addresses one of healthcare’s most persistent challenges: extracting actionable insights from unstructured medical data. In our conversation, David explains how Layer Health’s AI platform automates complex chart review tasks, tackles data generalization issues across diverse healthcare systems, and overcomes challenges like bias and dataset shifts. We explore Layer Health’s groundbreaking use of large language models (LLMs), the importance of scalable AI solutions, and the integration of AI into clinical workflows. Join us to discover how Layer Health is reducing administrative burdens, improving data accessibility, and shaping the future of AI-powered healthcare with David Sontag.

Key Points:

Hear about David's career journey from MIT professor to CEO of Layer Health.
How Layer Health transforms chart reviews and enhances healthcare workflows.
The role of large language models in solving the company's scalability problems.
Learn about Layer Health's approach to benchmarking performance for institutions.
Explore how the company navigates dataset shifts and ensures robust model performance.
Discover Layer Health's strategies to identify and mitigate bias in clinical AI models.
Find out about the challenges of implementing reasoning across diverse medical records.
Why building trust through data transparency, auditing, and compliance are essential.
David’s advice for AI startup leaders on balancing research with practical implementation.
Layer Health's long-term vision for reshaping healthcare and improving patient outcomes.

Quotes:

“Our vision for Layer Health is to transform healthcare with artificial intelligence, really building upon all of the work that we've been doing over the past decade in the AI and health field and academic space.” — David Sontag

“What we realized very quickly is that where [Layer Health] would have the biggest impact was bringing the right information to the physician's fingertips at the right point in time.” — David Sontag

“We're using large language models to drive the abstraction of those clinical variables that we need for these either retrospective or prospective use cases.” — David Sontag

“Where I think we're going to see the biggest source of bias is likely going to be not along the traditional demographic-related quantities, but rather on more clinical quantities.” — David Sontag

“A lot of the friction that we currently see in healthcare, [Layer Health] is going to really take a big bite out of [it].” — David Sontag

Links:

David Sontag
David Sontag on LinkedIn
Layer Health

Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.

Transcript:

[INTRODUCTION]

[0:00:03] HC: Welcome to Impact AI. Brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven, machine-learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people and planetary health. You can sign up at pixelscientia.com/newsletter.

[EPISODE]

[0:00:34] HC: Today, I’m joined by guest, David Sontag, co-founder and CEO of Layer Health, to talk about unlocking health data. David, welcome to the show.

[0:00:42] DS: Hi, thanks for having me.

[0:00:43] HC: David, could you share a bit about your background and how that led you to create Layer Health?

[0:00:47] DS: Sure. So, I’m a machine learning researcher and before founding Layer Health, I was a professor at MIT in the computer science department. I also am working in the healthcare field and part of something called the Institute for Medical Engineering between MIT, that’s at MIT.

I found Layer Health about two years ago, together with several of my former students from MIT and one of our closest clinical collaborators, Beth Israel Deaconess Medical Center. And our vision for Layer Health is to transform healthcare with artificial intelligence, really building upon all of the work that we’ve been doing over the past decade in the AI and health field and academic space.

[0:01:25] HC: So, tell me a bit more about what Layer Health does and why it’s important for healthcare.

[0:01:29] DS: So, Layer Health’s first set of products is an AI platform for chart review. It’s everything in healthcare where you have physicians, nurses, and medical coders, reading the patient’s medical records to abstract specific information about the patient. If we take a step back and look at some of the work that led us to found the company, what we found over the years is that this chart review problem was the slowest part of everything that we wanted to build in healthcare.

So, I can give you two examples. The first example is from our work on precision medicine and disease progression modeling. There, we take data from patients, which is longitudinal data from their electronic medical records. And we wanted to ask questions about, well, which treatments work best for which patients?

In order to do that, you have to be able to address some of the confounding that occurs in today’s medical practice. So, doctors will give treatments because obviously they think that those treatments are well indicated for that patient. If you want to use historical data, you have to try to correct for that confounding when doing your analysis. But all of the information you would need in order to do that type of work, for example, data about the patient’s past medical history, what treatments were actually given, the outcomes the patients actually had, will always have to be abstracted from the patient’s medical record. It’s usually not in structured data in any way.

So, before one could even use a machine learning model to do that type of causal inference, one would have to first usually have research assistants review the patient’s medical records to abstract those treatments, those outcomes, those confounding factors, and that would usually lead to months of work before one could even begin the work, the analysis.

If you go to the other side of my interests, not just looking retrospectively which treatments are working best for which patients, but rather trying to bring machine learning and AI into healthcare systems to actually start to change clinical workflows and ultimately get to better outcomes for patients, there as well, we had the same blocking points.

For example, with one of my co-founders, Steven Horng, we developed machine learning algorithms in the emergency department context, where we tried to take all the data we have on patients who were walking into the ER and use that to make sure that – to help ER physicians with their differential diagnoses, with figuring out the right initial treatments for patients. Although we started out with questions that were very much around risk justification, like which patients are likely to have sepsis. What we realized very quickly is that where we would have the biggest impact was bringing the right information to the physician’s fingertips at the right point in time.

For example, we wanted to be able to surface order sets that are appropriate for patients with cardiac etiologies. Or if you have a patient who has to be moved from one part of the hospital to another part of the hospital, you want to be able to know if the patient has a fall risk, you want to be able to surface that fall risk so that the team that’s transporting the patient can take fall precautions. And each of those questions, whether it’s figuring out does the patient have a cardiac etiology or do they have a fall risk, again requires understanding something deep about the patient, which is typically never available as structured data.

Again, the hardest part of bringing those types of interventions is a chart review problem of looking through the patient’s medical record, largely unstructured data to abstract very specific information that could then be used downstream such as by automatically surfacing the right set of suggested treatments or diagnostic workups or decision support.

[0:05:16] HC: How do you use machine learning to tackle all this?

[0:05:18] DS: The biggest problem with using machine learning for chart review is that of generalizability. Five, 10 years ago, when I started working in this area, we would build machine learning algorithms based on annotated data from each hospital that we’d work with. So, for example, in this most recent example I gave, we might go into the emergency department, look at the most recent 20,000 patients, label them for who has a cardiac-related issue who might have a fall risk and then use that to train a machine learning algorithm to do some of the natural language processing you would need here and then evaluate it on that labeled data.

What we had found traditionally is that those algorithms that we would learn, although they would work pretty well for one healthcare institution, they wouldn’t generally generalize well to other healthcare institutions. In fact, they typically didn’t even generalize within different of the same hospital. That’s because of the real intricacies of medical language and how each department, each hospital, both the data formatting and the particular ways in which things are stated end up being quite different.

So, the goal of machine learning in the last five years has been one of how do we actually tackle that generalizability problem, which is what makes the scalability beyond a single hospital to all hospitals in the United States, really feasible. When we founded the company a bit over two years ago, my team at MIT, in particular another co-founder of ours, Monica Agrawal, and I had been doing a lot of work with large language models for the previous two years prior to founding the company. We found that with both commercial and our own models that we were learning at MIT, that we were able to solve this generalizability problem by using large language models in a way that we had never been able to do before.

That’s how we’re using machine learning. In this context, we’re using large language models to drive the abstraction of those clinical variables that we need for these either retrospective or prospective use cases.

[0:07:27] HC: Why are large language models able to solve this in a way that you weren’t able to before?

[0:07:32] DS: Well, one of the really exciting aspects of large language models is that they encode a lot of prior information in them. So, these models are trained, of course, on all of the world’s public information, which includes things like Wikipedia, medical textbooks, and in some cases, also on de-identified medical records. As a result of how these models were trained, they have a very good understanding of number one, language. For example, the English language, but number two, they also have a very good understanding of medicine and the ability to bring those two things together, understanding language, understanding the intricacies of how medical language might be described, allows one to generalize beyond just the ways in which medical concepts would be described in any one institution.

[0:08:21] HC: How do you gather the data in order to train these models? Do you still need to do any sort of annotation like you used to? Or is that all handled by the LLMs?

[0:08:30] DS: So, Layer Health in the first two years prioritized building a platform for scalability. What that required were two aspects. The first was the ability to benchmark how models would perform for any new chart review task. The traditional way of doing that benchmarking, of course, is to go label time of data, as we alluded to earlier. But what we’ve been developing is a whole platform that includes a front end, a user interface, that makes it really fast to validate a large language model output. So, one can go into a new hospital, a new chart review problem, use our platform, and really quickly understand when the model is working well and when it isn’t working well, because it’s already driven from Denovo with our large language model outputs.

Users of our internal platform that we’re using for building out new models, these could be subject matter experts to understand something about medicine like nurses, can use our platform then to, in a very targeted way give feedback about which aspects of the model are performing well and which ones need to be improved.

In one example, that feedback would be in terms of the evidence that it surfaces for any particular patient, any particular question that we might want to ask about it. The second aspect that we’ve been working on has been one of scalability. So, once you have a model that is doing something reasonable, which you can ascertain on a random sample of patients, you then want to be able to really make that model cost-effective so that it could run and scale on millions of patients at a time.

There again, we use our machine learning algorithms, but now our machine learning algorithms in this context are very much focused on the data distribution. For example, for each of our customers or partners, we would do this process of really decreasing the cost and actually also improving the performance focused on each customer’s data. So, each one of our customers would usually bring historical data into our platform prior to going live and we would refine our models and both improving their performance and also the cost in that context of that patient’s and on that hospital’s data.

[0:10:51] HC: Each of those scenarios you’re doing a separate validation for that particular use case in that particular setting?

[0:10:56] DS: Yes and no. What we’ve already found across our first set of initial customers is that there’s some chart review questions that occur over and over and over again. We see this particularly in the oncology space where you almost always are going to be asked questions about the cancer stage, about the data diagnosis, and so on. So, what we’ve been doing a layer has also been developing a library of chart review tasks that we have already done a historical validation on a diverse set of patient data for many different health institutions. We can already say something about what performance one could expect to see in any new setting.

So that, you could think about as one set of validations, then for each new setting, each, for example, new health system that we would launch with, we would be able to do a smaller validation with their data, if necessary, on the data elements that we already know do well. But then for the long tail of new quantities that each new use case requires, we would be to do a specific validation on that use cases needs for that health institution.

[0:12:10] HC: How do you ensure that your models continue to perform well over time? Things might change at a particular hospital. Things with patients might change. COVID is the classic example these days, but are there ways you can make sure your models continue to perform despite these changes?

[0:12:25] DS: Yes. It’s part of what gets me really excited about building this now as a commercial product that we can actually tackle the really important problems of data set shift and monitoring and robustness very well with the resourcing that you can only get with a venture-backed company. So, part of our product includes monitoring of models over time and that has to address several different issues. One of them is data set shift as you were just alluding to that things like COVID will come up that just weren’t, that disease never existed prior to a few years ago. You also have, of course, that you have changes like an EHR vendor change or new laboratory tasks that come into the system or get dropped off due to payers not paying for them.

This type of data shift occurs all the time, and we develop as part of launching each of our modules on top of our platform, we develop benchmarks, their health system and use case-specific benchmarks that allow us to monitor the performance as time goes on. Those benchmarks are not static. So, for example, we are doing a major push into quality improvement, where we are helping health systems with doing abstraction for clinical registries such as the American Heart Association Stroke Registry or the American College of Surgeons and NSQIP Registry and Surgery.

For each of those, the workflow that we have is that we use our AI to do an initial chart abstraction. Then we have a whole user experience where nurses who are trained in each of those clinical areas can review the results from our models and can confirm or deny the predictions that we make. Because we provide the evidence, we make it really easy to see and understand the evidence in the context of the patient’s chart, that whole process is really fast for them. But it also provides us the ability to continuously receive feedback on how our models are performing.

For example, if we see a spike of our predictions not being accepted or very different evidence being required to answer each question, that would be an indication of a dataset shift. Similarly, what we can do for the customers that are willing to do this is we can randomize some of the questions that have to be answered. For example, for a small random sample, we instead of having our AI provide the predictions, we can ask the human experts to do it without the AI’s assistance. In that way, we can get unbiased feedback that will allow us to validate that the models are continuing to perform well over time.

[0:15:06] HC: Thinking some more about bias in the health care context, this often comes up related to race or sex or age, different characteristics like that. How might the bias come up with the clinical data that you’re working with and what are some things your team is doing to mitigate it?

[0:15:21] DS: Maybe we can use that clinical registry example to try to answer that question. So, let’s dig into the second one of NSQIP. This is a surgical registry that several hundred hospitals in the United States participate in, and for a random sample of patients every eight days, that the health system will abstract three surgical risk factors that are reasons why the patient might have had to get the surgery and that might affect the success or failure of the surgery, they’ll abstract details about the actual surgery that was performed, and also then post-surgical complications.

So, think about the output being a big Excel spreadsheet, one row per patient, one column for each question that has to be answered from the pre-surgical to the post-surgical questions, and about 300 columns in total.

One of the key questions that we look at before we launch with each new health system, is we’ll look at to see, well, how is the model actually performing on historical data? Almost exclusively health systems today are doing this manually. So, they have had humans, and in this case, highly trained nurses typically, abstract this data for that random sample patient for the previous five or more years. We’ll take those historical data dumps and we will run our models on them and we’ll do a deeper dive to understand where the models are performing well and where they’re not performing well.

You can slice and dice it in many ways. Of course, if information on, for example, age or race were available, we can slice and dice those performance metrics along those axes as well. Where I think we’re going to see the biggest source of bias is likely going to be not along the traditional demographic-related quantities, but rather on more clinical quantities. For example, a health system might have five or six different hospitals that are doing abstraction for NSQIP, that are going to be using later health platform, and the data from each one of those hospitals is going to look a little bit different.

The sources, especially if they had different abstraction teams from each hospital doing abstraction before, also the hierarchies of evidence that are used and the error modes are going to be a bit different for each one of those hospitals, even within the same health system.

Moreover, the populations will end up being quite different, so some of the hospitals in the health system might be more rural or community, others might be more academic, and so there may indeed be actually big differences in the demographics of the patients that would go to each one of those sites. All of that is going to manifest in differences in performance among other algorithms in each of the different clinical sites. You could even break it down one level deeper to look at, for example, the different types of surgeries, whether it be a surgery related to some musculoskeletal type problem or cancer-related surgery and so on.

So, what we would do in that situation is do a deeper dive to understand the error modes along those more clinical axes. And if and when we see differences in performance, we would then start to do a deeper dive to understand why, and that would usually be done in partnership with our partners and our customers. During that deeper dive, differences, for example, in demographics would start to become front and center and will allow us to diagnose the underlying bias if there’s any that’s present, and to address it proactively before such models are launched in production prospectively.

[0:19:00] HC: We’ve talked about a few different challenges that you encounter and the solutions that you develop to accommodate them. So, generalization, data shift, and bias. Are there other challenges that you regularly encounter working with patient data? What are the most important ones?

[0:19:18] DS: Well, up until now we’ve been talking really about machine learning challenges. All I wanted to more to that list, but then I want to bump up level beyond just machine learning, what it takes to really bring this AI-type solution to the healthcare market. So, to add one or two more machine learning challenges, one of them is how does one do reasoning across a full patient’s medical record? Most of the academic papers that are published in the AI and health field and in the natural language processing research communities are focused on relatively simple inferences that you would make from a single patient note or a single document.

But in healthcare, the questions we ask are much more subtle. Even a deceptively simple questions such as, “When was the patient’s cancer diagnosed?” Is something that typically requires doing reasoning across many different notes for patients and also structured data. So, for example, the patient might first have gone on, had a laboratory test result that was abnormal, then they might have gone to get an imaging or maybe than a biopsy. At some point, during that medical journey, you could say that the patient’s cancer was discovered and definitively diagnosed.

But that precise diagnosis time was going to end up being very subtle, something which has to be defined very clearly. I do want to, for example, biopsy confirmed diagnosis and they require even connecting the dots between several different places, like was the biopsy performed versus when did the biopsy result come back with the particular indication that it confirmed the cancer? Building models that do that chart review across that both structured and unstructured data longitudinally across time is very, very challenging, not something that has been very well studied by the machine learning community.

So, that’s a very active part of what we’ve been doing at Layer Health. We have a phenomenal AI research and development team that has developed some of the state-of-the-art models for this so far. None of them are published. We’re very much focused on really bringing the best product to market right now, but there’s a lot of exciting innovation that’s happening and it’s going to be a big source of the next couple of years for us. We’re in the process of doing a big fundraiser, that will be to continue to develop that aspect of the machine learning models that I was alluding to and continue to be really at the front and center of this field.

Would you like me to tell you a bit about the sort of broader landscape of launching a machine learning solution in healthcare?

[0:21:57] HC: Yes, let’s do it.

[0:21:57] DS: So, so much about what we’re doing is not only about the machine learning algorithms but also about how do you get it into the hands of health systems in the right way. So, part of that is infrastructure. We are building on top of standards like FHIR to make it really easy for health systems to launch with Layer Health by allowing us to pull both historical and then prospective patient data via this standard, which makes health data interpretable across both health systems and health and electronic medical record vendors.

But associated with that are questions of, for example, how do you make sure that you are doing a proper auditing that not only you can provide the audit logs required by HIPAA to make it very clear why particular patient data was pulled for what authorized use, but then also to have full transparency over where that data goes, how it’s being used, and what algorithms were used on it. Something that some of our initial health system partners have expressed a lot of interest in was, “Well, how do we know when your AI is changing and how are you going to inform us of that?”

So, every one of Layer Health’s models have very careful logging associated to them, so we can always say this was the exact model that produced this answer with lots of automatic failover for robustness as well. We have processes in place that can tell our customers when we do major model changes and what the implications are going to be for them on the results that we’ll be providing them in the near future. I mean, we’ll stop there to not dwell too long on that.

[0:23:43] HC: Is there any advice you could offer other leaders of AI-powered startups?

[0:23:46] DS: I think every AI startup ends up being a little bit different. This is a really exciting time that we are in. I’m an AI researcher and so part of what I’m doing now as CEO is really controlling the urge that I have as an AI researcher to continue to push forward the AI research and development and really make sure that we’re tackling the right problems that are not only the right problems for our customers from a business impact perspective, but also the right problems for AI, because not everything in healthcare requires an AI solution or even would be benefited from it. Some of them are more process-related changes.

So, it’s really about recognizing that AI is not the solution to everything, that AI usually has to be accompanied by very many human factors and process and careful process improvement that I think is what makes a strong AI startup.

[0:24:40] HC: Finally, where do you see the impact of Layer Health in three to five years?

[0:24:45] DS: I think healthcare as we know it is going to completely change over the next five-plus years. A lot of the friction that we currently see in healthcare, Layer is going to really take a big bite out of. We see that friction first in some of the administrative functions of healthcare, which is our immediate focus, and the quality improvement space where we’re going to help substantially reduce the burden of clinical registry reporting. But the same algorithms that we’re using for those more administrative purposes are going to get closer and closer to actual clinical care and to really help change outcomes as well, not just retrospectively but prospectively.

So, over the next couple of years, we’ll start rolling out algorithms that can help with site of care optimization so that patients get care at the best place for them. We’ll start rolling out our same algorithms for helping standardize clinical pathways so that in every area from oncology to cardiology where there are very well-established guidelines such as the NCCN Guidelines in cancer or the American Heart Association get with the guidelines for stroke, for heart failure, and so on. We can not just measure when the guidelines have been followed, but start to bring algorithms to the point of care to help guide clinicians towards following these best practices, which we think ultimately is going to really improve the quality of healthcare across the US.

[0:26:20] HC: This has been great, David. I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[0:26:27] DS: Please go to layerhealth.com.

[0:26:29] HC: Perfect. Thanks for joining me today.

[0:26:31] DS: Thank you for having me. And for anyone who’s listening that might be interested, we are expanding and we’re hiring in every possible position you can imagine.

[0:26:39] HC: All right, everyone. Thanks for listening. I’m Heather Couture, and I hope you join me again next time for Impact AI.

[OUTRO]

[0:26:49] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend, and if you’d like to learn more about computer vision applications for people in planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]