Distribution Shift Diagnostic

Are you confident that your computer vision models will generalize to the variations and batch effects in your inference images?

Have you thoroughly validated your models? What variations should you even be testing for?

Sometimes the distribution change causing degraded performance is obvious, like a different imaging device or geographic location. But it can also be hidden or caused by multiple factors.

Gathering a large, diverse training set may be the ideal solution but is often not feasible – especially for medical images.

What do you do?

Perhaps you have a few techniques in your toolkit like color augmentation or normalization to handle the obvious changes in image color.

But have you ever taken a step back to fully understand the batch effects in your data?

Or what types of distribution shifts you might encounter in the future?

Or searched for spurious correlations that may be biasing your results?

Generalizable models are important for multiple reasons:

  • So that they will make correct predictions on previously unseen images

  • To ensure they work with images from a variety of scanners, labs, patient populations, geographic regions, vegetation types, etc.

  • To get regulatory approval for your product

  • So that they are unbiased and provide the most valuable information

Get a clear strategy for validating your models and improving their generalizability

This diagnostic helps organizations using computer vision handle challenging datasets by:

  • Identifying sources of variation, batch effects, spurious correlations, and distribution shifts

  • Properly validating models to reveal how they might fail

  • Applying data-, image-, and model-centric techniques to make models more generalizable

We start by understanding the variations between your training and test sets, as well as the batch effects within your training data, what physical processes or differences in data distributions may have caused them, and how these variations impact your model performance. Then I’ll devise a plan for you to improve model generalizability with adjustments to your training data and your algorithms.

Here’s how it works:

I’ll interview some of your leadership, domain expert, and machine learning team members to discover your overall goals, the variations present in your training and inference data, and your current approach to validation and generalizability. With this knowledge, I’ll deliver a report with recommendations for the following three areas:

Part 1:
Sources of variation
Part 2:
Analytical validation of models
Part 3:
Improving generalizability
  • Sample preparation variability
  • Image acquisition batch effects
  • Patient subgroups

Remote sensing:
  • Variations across geographic regions
  • Seasonal and lighting changes
  • Weather conditions
  • Cross-validation
  • Out-of-sample predictions
  • Metrics
  • Stratification
  • Error analysis
  • Approaches for each type of challenge identified
  • Data-centric
  • Image-centric
  • Model-centric

A follow up call two weeks after report delivery is also included to ensure that you have all the information you need to move forward.

Results you can expect from this diagnostic:

  • Save time by targeting the variations and batch effects in your data instead of wasting time on unsuccessful approaches

  • Discover challenges early in product development so that they don’t become surprises later

  • Get your product to market faster

Ready to get started?

Are you ready to get your project on the path to success with a clear strategy for handling distribution shifts? Apply for a Distribution Shift Diagnostic by clicking the button below. The price is $10,000, which could save your team many iterations of experimentation – and hundreds of thousands of dollars.

Don’t just take my word for it…

Heather has accelerated the development of our machine learning projects by challenging the status quo and introducing the team to new approaches. She helped us get out of the echo chamber and brought an outside perspective with very relevant alternative approaches that would have taken a lot longer to do on our own.

-- Joe Sturonas, Ancera, VP of Systems Development

Heather has deep understanding of digital pathology and machine learning and their application to whole slide images. Her in-depth review of the current state of the art research has enabled Gestalt to rapidly focus our machine learning efforts on approaches that are yielding value for our pathologists and their patients.

-- Brian Napora, Gestalt Diagnostics, VP of Sales & Product Management

Heather provided practical feedback and guidance, which allowed us to quickly take steps in the right direction and improve the performance of our models.

-- Darragh Maguire, Deciphex, Artificial Intelligence Engineer

Still have questions?

Availability is limited

I only offer one Distribution Shift Diagnostic per month. Scheduling is first come, first served. The sooner you apply, the sooner you will have a clear path to more generalizable models.

Apply for a Distribution Shift Diagnostic Now