In this episode, I talk with Joe Peterson, co-founder and CTO of SimBioSys, about biophysical modeling of cancer. SimBioSys is trying to revolutionize precision cancer care through individualized treatment planning, accelerated drug development, clinical trial optimization, and comprehensive biomarker development. Joe and I talked about the challenges of working with heterogeneous forms of data and the ways bias can manifest when training models on medical data.


“We use AI or ML at effectively every point in the process, both in our clinical medical devices, but also for our internal R&D.”

“Have you ever seen the way weather scientists simulate a hurricane? We do a very similar thing within the body, or if you’ve ever seen mechanical engineers simulate the combustion of a gas and a gas turbine, we do a similar type of thing within these patient models.”

“If you’re able to distill the processes that go on biologically, chemically and physically to their essence, you can create building blocks that can be mixed and matched.”

“Our thought was, let’s not ask the models to do too much. Let’s ask them to do one thing that we need them to do very, very well. This allows us to have more collected data or more directed data collection, as well as more clearly defined goals in terms of business value and delivering business value to each of the models.”

“All these different types of data are much more heterogeneous. They come from many different scales. They come from many different sources. They’re encoded in many different ways, and so there’s a huge effort, on the research and development side, just to extract what’s meaningful in those different types of data sets so that we can begin to define those biophysical building blocks that ultimately make it into the clinical application.”

“It’s just really about capturing the variability and trying to drive out as much variability up front as you possibly can.”

“We also develop models that are generally capturing any sort of drift in the data over time.”

“You wanna understand outside of just a research setting, but out there in the wild how well your models are going to work, how often you’re going to return a null result or an inconclusive result to a physician and being able to track that over time is really important from a quality control standpoint.”

“It’s all the quality control machine learning models and deep learning models that make up the bulk of those internally.”

“Our responsibility as practitioners of AI is to not only identify and understand that bias, that historical bias, but also try to account for it as best we can.”

“What we need to assess when developing drugs or algorithms or devices is how they were trained, how they were tested, and really stratify those patient populations as best we can to sort of understand, at the very least, how they’re behaving.”

“We’ve spent a lot of time trying to account for that variability as best we can. That said, we don’t have a perfect data set and we’re constantly thinking about ways to improve it.”

“I think what it comes down to is being open and transparent and really looking at the data that you have at the end of the day, If doctors are going to trust medical devices and if they’re going to trust AI, they need to have information about.”

“By looking into and stratifying the patient populations in that way we can better understand where we need to targetedly spend resources to collect potentially more data to better understand the performance in those places or to improve our algorithms.”

“Adopt good machine learning practices early, just like good clinical practice or good manufacturing practices that are standards that are now being drafted and adopted.”

“Find the right partners to sort of drive the questions that you’re addressing and ultimately the clinical actions that you’re trying to address.”

“Models that are built to do a single task excellently well is a better approach than trying to build a model that does four or five tasks really well.”