Artificial intelligence finds new insights into molecular tumor properties using a different view – images of cells and tissue.

The same technology that powers Siri and face recognition on your iPhone has also found success in medicine. By automatically analyzing microscopic images of breast tumor biopsies, AI may one day help guide cancer treatments.

This particular type of AI is called deep learning and over the last few years has become a part of our everyday lives. Its applications continue to expand to areas like language translation and self-driving cars, enabled by massive repositories of data. While deep learning was first applied to recognizing people, cars, and other everyday objects in photographs, it has more recently been adapted to study cancer. Our team of computer scientists and cancer researchers at the University of North Carolina at Chapel Hill used it to analyze types of breast cancer from microscopic images of tumor tissue.

The Power and Shortcomings of Deep Learning

Deep learning is a method of learning a new representation for images or other data by recognizing patterns. Also called a neural network, it consists of multiple layers of features where the higher level concepts are built upon the lower level ones. Going up the hierarchy, the features increase in both scale and complexity. Similar to human visual processing, the low levels detect small structures such as edges. Intermediate layers capture increasingly complex properties like texture and shape. The top layers of the network are able to represent objects like people.

Learning these patterns allows the computer to make predictions. After training on a large data set containing content labels, the model can predict these labels on new data that it was not trained with. For example, given images of people and the location of the faces in each, the model can find faces in new photos.

The key factor in successfully training a neural network is a large amount of labeled data. Many state-of-the-art models are trained with tens or hundreds of millions of labeled images. The most commonly used public data set is ImageNet, which has 1,000 classes of objects and scenes that were gathered from photo-sharing sites like Flickr. However, in the medical field, patient samples are scarce and expert annotations of these samples are expensive. Training a large model on a small data set simply results in over fitting – the model performs well with the data it was trained on, but gives poor results when predicting on newly presented data.

Image credit: Pixabay

The ImageNet data set is comprised of photographs of 1,000 different objects and scenes.

Transfer Learning Brings Deep Learning to New Domains

However, there’s a shortcut to apply such large and powerful models to small data sets: transfer learning. The same network that has been trained on millions of photographs of objects and scenes can be adapted to many other applications, including microscopic images of tissue. The network computes a representation on the new set of images, and a new model is trained to make a prediction for each image.

Deep transfer learning works because many of the elements of images are the same across domains. The low levels of the network capture small structures like edges but are not powerful enough to distinguish complex image classes. The upper layers are very specific to the images on which they were trained, capturing things like faces and bicycle tires, but do not work very well on disparate image sets, such as medical images. The middle layers, however, are sufficiently powerful and generalizable, making them well-suited to new applications.

Breast Cancer Subtypes Predicted by Deep Learning

As a computer scientist, I work to bring these exciting advancements in deep learning to further breast cancer research. Our team studies cancer subtypes – smaller groups that a type of cancer is divided into based on certain characteristics of the tumor cells. I computed neural network features on microscopic images of breast tumors and trained models to predict different properties, including aggressiveness and molecular subtype. My models successfully predicted these properties on an independent test set and may one day help to guide treatment decisions.

Our team worked with a data set of microscopic images of breast tumor tissue. Each tissue sample was stained with a pair of stains called hematoxylin and eosin – or H&E - to turn different tissue structures pink or blue. A pathologist reviews such images to detect cancer and assign a measure of its aggressiveness, called grade. Other samples from each tumor are processed in different ways to determine molecular properties that can help identify weaknesses of individual tumors, in order to select appropriate therapies.

Image credit: Shutterstock

Microscopic image of a breast tumor, stained with hematoxylin and eosin to turn nuclei blue and cytoplasm pink.

My models predicted two molecular properties and grade. We then compared my predicted values by deep learning with those assessed by other techniques. Neither molecular property was previously known to be predictable from H&E.

While our grade measure replicates a pathologist, the other two provide new insights and potential cost savings for laboratories with limited resources. Standard methods for assessing molecular subtypes are costly but critical in determining the best course of treatment for a patient. Our image-based methods may one day provide an alternative.

The pink and blue H&E images of breast tumors are wildly different than photos of dogs, people, and cars, but the same methods still apply because similar shapes and textures are present. And those vacation photographs you posted on Flickr that became a part of the ImageNet data set were essential in training the models that we use to study breast cancer. Transfer learning makes deep learning possible for many new tasks – from cancer to climate change – and continues to improve your day-to-day interaction with technology along the way.

Would you like help implementing deep learning-based histology biomarkers?

I’ve worked with a variety of teams to develop advanced machine learning algorithms to extract new insights from pathology images. Schedule a free Machine Learning Strategy Session to get started.


H. D. Couture, L. Williams, J. Geradts, S. Nyante, E. Butler, J. Marron, C. Perou, M. Troester, and M. Niethammer, “Image analysis with deep learning to predict breast cancer grade, ER status, histologic subtype, and intrinsic subtype,” npj Breast Cancer, 2018.