In an emergency setting, making a quick diagnosis under pressure is often a matter of life or death. This is especially true when it comes to diagnosing infectious diseases. Unfortunately, diagnosing infections in an emergency department is rife with challenges. Current tests either take too long, deliver unreliable results, or both. That’s where Inflammatix comes in. They are using machine learning technology to develop a point-of-care instrument that will diagnose the type of infection, and severity of infection, in emergency care quickly and effectively. Their first main product is currently in the late stages of development and can deliver a test report in about half an hour using cold blood as a sample source.

Joining me today to shed light on this incredible initiative is Ljubomir Buturovic, Vice President of Machine Learning at Inflammatix. We hear from Ljubomir about the role that machine learning played in this technology, key challenges they’ve encountered while training models on gene expression data, how they selected the 29 clinically relevant genes based on published scientific papers, plus a whole lot more. Tune in today to learn more about the groundbreaking work being done at Inflammatix and what you can expect from them in future!


Key Points:
  • A warm welcome to today’s guest Ljubomir Buturovic.
  • Ljubomir’s background in machine learning and what led him to Inflammatix.
  • An overview of the important work being done at Inflammatix in healthcare.
  • Details about their main product for diagnosis in emergency care.
  • The role of machine learning in their technology to measure gene expression.
  • How they selected the 29 clinically relevant genes based on published scientific papers.
  • Key challenges they encountered while training models on gene expression data.
  • Ground truth labels; the strategies they used to identify infections and validate their models.
  • How they made sure that their models would work for multiple assay platforms.
  • Using grouped analysis to ensure their models would serve a diverse patient population.
  • Their approach to developing technology that would fit in with the clinical workflow and provide the right assistance to doctors and patients.
  • The benefits that Inflammatix has seen from publishing their work.
  • Ljubomir’s advice to other leaders of AI-powered startups working in healthcare.
  • Where you can expect to see Inflammatix in five years.

Quotes:

“We developed an instrument which measures this gene expression for 29 clinically relevant genes for infections.” — Ljubomir Buturovic

“It takes a long time to achieve adoption. This is basically applying AI in medicine. When you are applying AI in medicine, the whole process of development and adoption works on medicine timescales, not on AI timescales.” — Ljubomir Buturovic

“One of the key challenges in applying machine learning in clinical test design is the availability of samples for training and validation. This is in sharp contrast to other applications, like maybe movie recommendations, or shopping, where you have a lot of input data, because it's relatively easy to collect.” — Ljubomir Buturovic


Links:

Inflammatix
Inflammatix’s Machine Learning Blog
Ljubomir Buturovic on LinkedIn
Ljubomir Buturovic on X


Resources for Computer Vision Teams:

LinkedIn – Connect with Heather.
Computer Vision Insights Newsletter – A biweekly newsletter to help bring the latest machine learning and computer vision research to applications in people and planetary health.
Computer Vision Strategy Session – Not sure how to advance your computer vision project? Get unstuck with a clear set of next steps. Schedule a 1 hour strategy session now to advance your project.
Foundation Model Assessment – Foundation models are popping up everywhere – do you need one for your proprietary image dataset? Get a clear perspective on whether you can benefit from a domain-specific foundation model.


Transcript:

[INTRODUCTION]

[0:00:03] HC: Welcome to Impact AI, brought to you by Pixel Scientia Labs. I’m your host, Heather Couture. On this podcast, I interview innovators and entrepreneurs about building a mission-driven machine learning-powered company. If you like what you hear, please subscribe to my newsletter to be notified about new episodes. Plus, follow the latest research in computer vision for people and planetary health. You can sign up at pixelscientia.com/newsletter.

[INTERVIEW]

[0:00:34] HC: Today, I’m joined by guest, Ljubomir Buturovic, Vice President of Machine Learning at Inflammatix, to talk about responding to infections. Ljubomir, welcome to the show.

[0:00:44] LB: Thank you. I’m glad to be here.

[0:00:47] HC: Ljubomir, could you share a bit about your background and how that led you to Inflammatix?

[0:00:51] LB: Sure. It’s a bit convoluted, but let me start by, I was trained as electrical engineer and worked in my home country. Eventually, I acquired expertise in machine learning and bioinformatics, working as a postdoc researcher at the postdoc university. Then around mid-90s, or late-90s, the microarray DNA technology merged and I realized, along with many other people, it could be used for diagnosing disease in clinical context by measuring gene expression of many genes at once. About 2004, I started working on clinical diagnostics using machine learning at a company called Pathwork, and I’ve been in that field ever since. It’s 19 years now.

[0:01:56] HC: At Inflammatix, what does Inflammatix do and why is this important for healthcare?

[0:02:00] LB: We develop clinical diagnostic tests for diagnosing infectious diseases in emergency department, so in a hospital emergency care. The context is that right now, it’s difficult to diagnose infections in acute setting, which is the emergency care department. The tests that exist either are unreliable, or take too long, or both. The Inflammatix idea and concept is to develop a point-of-care instrument, which diagnoses a type of infection and severity of infection in the emergency care, and it delivers a test report in about half an hour using cold blood as a sample source. That’s our first and main product.

It is currently in late stages of development, which means it is not yet cleared for use. We are on the verge of running clinical studies to obtain FDA clearance and the right to launch the test. Beyond that, we are envisioning using this, our instrument, and to diagnose other diseases, either in emergency care, or in other settings that synergists our status at this point.

[0:03:37] HC: The main goal is to be able to identify infection sooner in a hospital environment and be able to have doctors respond to them quicker, so that you can save those patients, I would imagine?

[0:03:48] LB: Yeah. That’s the core goal of our lead product.

[0:03:53] HC: What role does machine learning play in this technology?

[0:03:56] LB: In the concept of our severity test is to measure what’s called gene expression in white blood cells. Gene expression is, who are not familiar, is basically, measuring abundance of genes in human cells, or other cells. Also, you can do it for animals and plants. But for us, obviously, the focus is human white blood cells. We developed an instrument which measures this gene expression for 29 clinically relevant genes for infections.

Once you have that measurement, you get a vector of features, which are just numbers corresponding to relative in abundance of the genes. This is used as characterization, or profile of the patient. The goal is to determine whether the patient has bacterial infection, viral infection, or no infection at all. In this context, it becomes a basically classical machine learning problem. You have input genes as features, you have output classes, bacterial, viral, or not infectious. Then the goal of the machine learning is to convert the input gene expression measurements into the class predictions, which are then presented to a end user in a test report, which is easy to interpret and guide diagnosis and treatment. That’s how we go from measuring blood gene expression to machine learning.

[0:05:42] HC: You mentioned that you use the expression of 29 genes. Where do these specific genes come from? How did you select them?

[0:05:50] LB: They were originally proposed by our co-founders, Tim Sweeney and Purvesh Khatri, and Jonathan Romanowski in a series of papers published in peer-reviewed journals. That was the basis for targeting Inflammatix and second, developing machine learning algorithms to implement their idea in a clinical setting. Throughout the years, as we were developing our instrument, we observed that some of those original genes were difficult to measure in a short time frame that we have in the emergency care, which is half an hour, plus, minus a couple of minutes maybe.

We substituted them for other genes, which were found to be more amenable to fast measurement on our instrument. Eventually, we also evaluated whether removing some of those genes was beneficial. But it was not the case. In the end, we concluded, this is the right signature for us. I should also mention that the instrument has limits as to how many genes we can place on the cartridge, which goes into the instrument. That was another constraint we had.

[0:07:18] HC: Yeah, part of us in machine learning terms, a feature selection problem in order to narrow down to which genes, but you already had prior information based on research showing which genes were relevant.

[0:07:28] LB: Yeah, very much so. We had prior knowledge. We had some constraints imposed by the instrument capacity, and we had other constraints imposed by the chemistry, or the process. When you put all this together, we have now in part heuristically and computationally arrived at a final signature, which we now use on the instrument.

[0:07:55] HC: What kinds of challenges do you encounter in working with and training models on gene expression data? [0:08:02] LB: One of the key challenges in applying machine learning in clinical test design is the availability of samples for training and validation. This is in sharp contrast to other applications, like maybe movie recommendations, or shopping, where you have a lot of input data, because it’s relatively easy to collect. But in clinical applications, you have to acquire biological samples from patients. You have to get their approval for using their personal genome measurements in research, and also, you have to wait, depending on a scenario, you have to wait for some time to obtain the outcome of the disease for the patients.

Depending on the use case, in cancer, it can be years. In our case, it is a lot faster, because it’s infectious disease, usually resolve much faster. When you combine all this together, you get relatively small sample sizes for training and validation. It can range from hundreds to maybe low to mid-thousands, but rarely above that. As a result, we are using rigorously the principles of cross-validation, which basically, maximizes the learning accuracy using a limited number of samples. That’s one limitation, one challenge.

The other challenge is that when we started the company, the instrument that we were working on did not exist. We just started developing. For a long time, years, we actually did not have samples measured, or processed on our target instrument. We had to use surrogate platforms to develop our machine learning classifiers. Then once instrument became a reality, we had to find a way to translate those models onto the new platform.

Another challenge is to define a metric for measuring accuracy of our product and of the classifiers. Some applications, or machine learning accuracies frequently used as a metric, but in medical use, accuracies are usually not very useful, because there are different types of errors. For example, in, let’s say, cancer screening, false positive test results have a larger weight, or impact than false negatives. In treated response, it’s the other way around.

In detecting infections, it’s even more complicated to find, because there is no consensus in clinical community, what should be used as a gauge of test performance. In the end, you have to tune the models against a large set of performance metrics. Let’s say, ballpark around a dozen, which complicates the model development, because a lot of literature for machine learning is basically designed to tweak just one metric. Usually accuracy.

I think these are key challenges. We worked with a clinical, with physicians in our company and outside the company to tune the performance to be meaningful for the target population of patients.

[0:12:08] HC: Where do you get the ground truth labels for training and for validating your models? Is there an alternative way to identify an infection? Perhaps, a method that involves waiting longer just to get those labels?

[0:12:19] LB: There are methods to do it. Obviously, we had to use some ground truths. But none of them are really a 100% reliable. First, you can do microbial testing of blood. It takes a couple of days, so it’s not really usable in an emergency setting. One problem is that it takes too long. The other problem is that significant fractions, or infections are not present in blood. That’s not a 100% reliable. There is a fever, but of course, there are many other diseases, which also induce fever. There is a follow-up of clinical care and there is clinician’s judgment. What we do is we basically, engage clinical specialists who review the medical charts for the patients and they have all the information, so all the tests that were performed and all the clinical symptoms and the results of the treatment.

Based on this, they make a judgment, what was most likely diagnosis for the patient. If they are in sync, they have consensual decision, for example, three or five physicians, then we declare that’s the ground truth and use it for training. This also helps explain why it is pretty complicated to acquire sufficient samples, because for each patient you have to go through this consensus building analysis by clinical physicians.

[0:14:03] HC: That process is needed in order to have accurate ground truth in order to train these models, I suspect?

[0:14:09] LB: It’s crucial, because otherwise you are training on early unreliable ground truth labels. There are some methods for training with not a 100% reliable ground truths. But I would say, there is no agreement on how best to do this in a machine learning community, so we had to improvise to a degree.

[0:14:38] HC: How do you ensure that your models work for multiple assay platforms and for a diverse patient population?

[0:14:46] LB: There are two parts to that. One is the platform issue and one is the heterogeneity of patients. For the platform issues, we developed an in-house workflows, which perform machine learning tuning, for example, hyperparameter search by adding additional constraint to that search, which is the platform transportability. In other words, we eliminate from the search models, which may perform well on one platform, but do not perform as well on other platforms. That way we narrow down the hyperparameter tuning to only those models which are likely to be more or less consistently accurate across platforms.

For the second part, which is heterogeneity of patient population, it is indeed a major issue in infectious disease. For background, I was working before in cancer diagnostics, and then I moved over to infectious diseases at Inflammatix, and I observed that heterogeneity in infectious world is both higher than in cancer, which was somewhat surprising, but it is factually true.

To solve this, we rely on methods called grouped analysis. There is a variant of cross-validation called grouped cross-validation, where the cross-validation folds combined in such a way that each fold contains a distinct set of groups that could be, for example, hospitals, or clinical studies. The testing subset within cross-validation does not contain data from those hospitals.

That way, we partition the training and test subsets into these joint groups. If the model passes this type of cross-validation, then we have increased confidence that once trained, it will work across different populations of patients. Basically, and short answer is we use grouped machine learning methods, which account for this heterogeneity extensively through our machine learning, or close.

[0:17:22] HC: How do you ensure that the technology your team develops will fit in with the clinical workflow and provide the right assistance to doctors and patients?

[0:17:31] LB: That’s the work which is spearheaded by our clinical affairs department and marketing department and engineering, because you have to make the test clinically useful, obvious to users in this particular context, emergency care. It has to be basically small enough to fit on the available space, emergency department. Third, it has to provide test report on the instrument screen, which is easily interpretable by the care providers in the emergency department, and it also has to fit into their workflow. It cannot be a major obstruction.

That’s why we impose the 30-minute limit on test report. This was all verified and checked with the key opinion leaders in the infectious disease space, that was the basis for designing the instrument and machine learning component of it.

[0:18:46] HC: Really, it’s about understanding the environment that the system will be deployed in and then the needs of the end users, and in this case, the emergency care workers, and probably getting feedback from multiple viewpoints.

[0:18:58] LB: Yeah, definitely. That’s how it works. I should point out that our CEO, Tim Sweeney is a trained physician who worked in emergency care. We have easily accessible first-line feedback. We add to the input from clinical community.

[0:19:19] HC: I saw that Inflammatix has been a contributor to a number of research articles. What business benefits have you seen from publishing your work?

[0:19:26] LB: In our business and in any other application, or machine learning, in clinical care, publications is absolutely vital, because that is the way that we demonstrate to our stakeholders, physicians and others what is the value of our value proposition of our tests being exposed, our science to scientific community, which builds trust. It also improves visibility of our test among the stakeholders who follow relevant publications. It’s a crucial component for adoption of any clinical test.

[0:20:14] HC: Is there any advice you could offer to other leaders of AI-powered startups?

[0:20:19] LB: That is very high level, generic question. Depends heavily on the application domain. I could offer advice in the domain that I work in, which is AI-powered tests and products for clinical setting, clinical care. The biggest advice I would offer is that basically, it takes a long time to achieve adoption. When you are applying, this is basically applying AI in medicine. When you are applying AI in medicine, the whole process of development and adoption works on medicine timescales, not on AI timescales.

AI, as we know and we are witnessing almost every day, there are new innovations and quite frequently breakthroughs. But we know that medicine doesn’t operate on that type of timescale. Anyone who wishes to apply their AI expertise in medical domain has to accept, basically, that progress in medicine is much slower, what they may be used to. For a good reason. One is the problem being solved is thoroughly complicated, the human biology. The other reason is safety, because we don’t want to put into clinical use something that isn’t inherently safe for patients.

Third factor is regulation, because again, for a good reason, a lot of those products are covered by FDA rules in the United States and other agencies in other countries, which adds another level of complexity, and frankly, time for development. All these factors contribute. This is not to discourage anybody. This is just to set expectations realistically.

[0:22:22] HC: Finally, where do you see the impact of Inflammatix in three to five years?

[0:22:27] LB: We are targeting infectious disease and sepsis, which has huge impact in clinical care. Our plan and hope is that we achieve adoption of our tests and that way, we improve the clinical care and health for many millions of people, because those are the numbers of patients seek care in emergency department for these type of symptoms.

Beyond that, we envision developing other tests maybe outside of emergency department and for additional other indications, not necessarily infectious diseases. With this platform that we developed, we think we can address some key areas of unmet clinical needs, and that’s our hope for this time frame.

[0:23:28] HC: This has been great. Ljubamir. I appreciate your insights today. I think this will be valuable to many listeners. Where can people find out more about you online?

[0:23:36] LB: Inflammatix.com is our company website, which has pretty extensive information about the tests we are developing. There is also a special machine learning section, which has our blogs, which we issue periodically. If you’re interested in AI in medicine, you can also follow me on LinkedIn. Search for my name.

[0:24:04] HC: I’ll also put a link to those in the show notes, including your LinkedIn. Thanks for joining me today.

[0:24:08] LB: It was a pleasure. Thank you very much for hosting me.

[0:24:11] HC: All right, everyone. Thanks for listening. I’m Heather Couture and I hope you join me again next time for Impact AI.

[0:24:17] LB: Thank you. Bye.

[END OF INTERVIEW]

[0:24:22] HC: Thank you for listening to Impact AI. If you enjoyed this episode, please subscribe and share with a friend. If you’d like to learn more about computer vision applications for people and planetary health, you can sign up for my newsletter at pixelscientia.com/newsletter.

[END]