Deep learning on whole slide images - without patches!

Hi ,

Whole slide histopathology images can be as large as 100,000 pixels across. Such massive images are both time-consuming and costly to annotate in detail.

For some tasks, pathologists can annotate individual features within the image like tissue types or individual structures like mitotic figures. But for other tasks, only higher-level annotations are possible.

Patient-level labels could be obtained from clinical data, such as whether a pre-invasive lesion became invasive, how long the patient lived after diagnosis, or whether they responded to a particular treatment. It could also be a label obtained from a different type of analysis performed on the tumor, such as molecular analyses to identify mutations or genomic subtype. Alternative methods of immunohistochemical staining could also produce a label for the entire tumor, for example the receptor status.

Most deep learning solutions for whole slide images divide them into patches and process each independently because the whole image won’t fit in GPU memory. But some recent innovations have also found ways to train a CNN on the whole image simultaneously.

A CUDA feature called unified memory provides the GPU direct access to host memory. Similar to virtual memory, pages are swapped on the GPU when requested. Through this technique, Chen et al. were able to process images as large as 20k x 20k pixels [1]. Any larger became prohibitively slow.

[1] An annotation-free whole-slide training approach to pathological classification of lung cancer types using deep learning - synopsis

An alternative is a method called streaming that exploits the locality of most CNN operations. It combines precise tiling and gradient checkpointing to reduce memory requirements. To stream the forward pass of a CNN, you first calculate the feature map of a chosen layer in the middle of the network. This layer is smaller than the original image because of downsampling, so fits on the GPU. This reconstruction of the intermediate feature map is then fed to the remainder of the network. The backward pass is computed in a similar fashion.

Pinckaers et al. first demonstrated streaming with a ResNet on 16k x 16k images [2]. The limitation of streaming is that it cannot handle feature map-wide operations such as batch normalization in the streaming (lower) part of the network. As a workaround, they froze the mean and variance of batch normalization layers.

[3] High Resolution Medical Image Analysis with Spatial Partitioning

From unified memory to streaming to halo exchange, each of these approaches enables end-to-end training of much larger images — but still with current limits around 20k x 20k pixels or less. We are not yet able to process larger images without downsizing and losing details that may be important for prediction.

But if tissue architecture is likely to be more informative than small scale image features for a particular task, these approaches are definitely worth trying.

Hope that you’re finding Pathology ML Insights informative. Look out for another edition in two weeks.

Heather

P.S. Want to learn more about weak supervision for whole slide images?

Check out this article that I wrote for Towards Data Science:
From Patches to Slides: How to Train Deep Learning Models on Gigapixel Images With Weak Supervision