Finding patterns in weakly labeled gigapixel pathology images

Hi ,

Welcome to the first edition of Pathology ML Insights!

Medical imaging can be a challenging area to apply deep learning because of the limited amount of labeled data. Most datasets are small - often 1000 patients is considered a large sample!

High resolution microscopic images such as whole slide images are also massive and both time-consuming and costly to annotate in detail. This can leave only patient-level labels for tasks such as cancer detection or classifying subtypes of cancer.

How do you create a model to learn the class of such large images with only weak labels?

I tackled this challenge on H&E images a few years ago during my PhD, and techniques have continued to evolve. Three recent papers provided innovative advancements.

Weakly supervised learning typically starts with learning a representation for individual image patches in these large images, as a full image is too large to train with at once. The magic then comes in how these patch encodings are aggregated to make predictions for the whole image.

The most common solution today is with an attention mechanism that computes a weighted average of the patch encodings, with the weights learned from the patches themselves. This permutation-invariant pooling produces a single feature vector for the entire image, on which a regular neural network can then learn the class of the entire image.

[1] and [2] both use this attention mechanism while [3] proposes an alternative called certainty pooling that can assess how confident the model is in the label of each patch and use this measure as a pooling weight.

The great thing about these methods is that they also provide a means to interpretability by creating a heatmap with the attention weights for each image patch.

This weakly supervised learning approach is applicable not only for classifying tumor types, but also for predicting outcomes or treatment response. The two key components are the image patch representation and aggregating patch features into a predictive model for the whole image.

The examples below are for H&E histology, but the same techniques apply for other stains and imaging modalities. In these papers, each patch is encoded using a CNN pre-trained on ImageNet or with self-supervised learning (a topic for a future newsletter).

[1] Data Efficient and Weakly Supervised Computational Pathology on Whole Slide Images - synopsis
[2] Self-Supervision Closes the Gap Between Weak and Strong Supervision in Histology - synopsis
[3] Certainty Pooling for Multiple Instance Learning - synopsis

While more detailed annotations within each image should be used whenever they are available, weakly supervised learning can provide a powerful alternative. When an appropriate model is selected, it can even succeed with only a few hundred labeled whole slide images.