Fight Cancer with AI

Getting pathology startups to market faster by building generalizable computer vision models

ML Strategy Options

Do your machine learning models perform well on your training data but fail on images from a different source?

  • Are you frustrated by the never ending cycle of experiments?
  • Does your team struggle to keep up with the rapidly advancing field?
  • Are you unsure whether you’re focusing on the best model types for your application?
  • Do you have data challenges like noisy labels or small training sets?

Build robust and generalizable models

  • Follow best practices for your unique challenges with pathology images
  • Receive practical advice to get you results quickly with less wasted time on unsuccessful approaches
  • Stay up-to-date with the latest and greatest tools and techniques
  • Develop unbiased models that provide the most valuable information to clinicians and patients

Meet Heather D. Couture, PhD
Consultant & Researcher

While working with a variety of startups on computer vision projects, I noticed a pattern: models often don’t generalize when you expect them to. And, unfortunately, this can come as a surprise late in the project. I quickly learned to

  • Identify sources of variation or domain shifts – pathologists and other domain experts are essential for this
  • Validate early and often to reveal failure modes – essential for making improvements
  • Tackle these challenges with data- and model-centric techniques

In deciding which approaches to try, I learned to look at it through the lens of generalizability. Now I help startups do the same. Given the unique characteristics of a pathology dataset, I help them identify the best path to models that handle the expected variations, enabling them to take their products to market sooner.

Featured in:

Scientific American
The pathologist
Digital Pathology Association
IEEE Spectrum

Organizations I've Worked With

Digital Smiths

Success Stories

The Power of Machine Learning

Automate labor-intensive
or mundane processes
Increase efficiency
and repeatability

  • Count mitoses
  • Segment glands
  • Characterize nuclei shape
Improve precision
and productivity

  • Find tumor in whole slide image
  • Predict one imaging modality or stain from another
Innovate to learn concepts
beyond human capabilities
Discover new insights
and drive impact

  • Distinguish classes too complex for human experts
  • Infer molecular biomarkers from H&E
  • Predict patient outcome

Build generalizable models

  1. Decipher the distribution shifts in your data
  2. Properly validate your models
  3. Apply data- and model-centric techniques to make your models more robust

ML Strategy Options

Machine learning projects for pathology have unique challenges

Large & Diverse Images

Whole slide histopathology images are often more than 60,000 pixels across. They contain multiple tissue types and both tumor and non-tumor tissue. Tissue appearance is heterogeneous, both from patient to patient and sometimes within a single tumor.

Weak Labels

For some applications like mitosis detection and tissue segmentation, pathologists can provide detailed annotations. But for patient-level prediction tasks like molecular biomarkers, treatment response, or patient outcomes, the algorithm must learn itself which regions of the image are important, often employing multiple instance learning.

Limited Labeled Data

Some applications of computer vision make use of millions of labeled images, whereas data sets of 1000 or so patients are much more common for medical applications. Transfer learning and self-supervised methods are often critical to success in this low sample size regime.

Additional Modalities of Data

The most powerful models that can improve patient care and outcomes often use multiple modalities of data - histopathology, clinical, genomic, proteomic, etc. Specialized models can make use of these structured and unstructured sources of data.

Language Barrier Between Disciplines

Understanding the intricacies of a particular application and possible clinical use cases requires the expertise of pathologists and other domain experts. Project success depends on communicating both ways - about the disease and about the machine learning solution.

The Challenges of Real World Pathology Data

The process for machine learning is empirical and iterative - hypothesize a model, test it, and improve it.

While many talented machine learning engineers can create and train a model, they may be inexperienced with the challenges posed by real world data. The complexities of massive images and noisy or missing labels are a whole different ball game than clean benchmark data sets.

Machine learning engineers may also lack the experience to identify unique aspects of data that, with a customized model, can improve predictions.

I spent my Ph.D. developing solutions to study breast cancer and learned to create new machine learning methodologies motivated by particular aspects of the research data.

The same may be beneficial for your project, but you won’t know it without looking from the right perspective.