Latest advances in predicting molecular biomarkers from H&E

Hi ,

If you’ve been following me on LinkedIn for a while now, you’ve likely noticed that one of my favorite topics is using deep learning to predict molecular biomarkers from H&E. This was the focus of my PhD research, and it’s been exciting to follow the rapid advances since.

To assess tissue properties like receptor status, genomic subtype, mutational status, or other clinically-relevant phenotypes, pathologists often rely upon different types of molecular analysis -- immunohistochemistry or RNA sequencing, for example. These analyses are time-consuming and costly, so are not routinely performed. However, they can provide key information for selecting an appropriate treatment.

Deep learning has demonstrated repeated success in predicting some of these complex and abstract biomarkers from H&E alone, even on datasets with fewer than 1000 patients. Larger training sets will likely enable improved prediction performance.

We might not need to wait for larger datasets after all though -- Krause et al. just published a method to improve classification accuracy by augmenting their dataset with synthetically generated images [1].

They trained a Conditional Generative Adversarial Network (CGAN) on their training set to create new sample images with and without microsatellite instability. Augmenting their training set with these synthetic images improved model accuracy.

GANs have advanced rapidly over the last few years. (You’ve probably heard about “deep fakes.” These are created by GANs.) Great to see a practical use for histology!

[1] Deep learning detects genetic alterations in cancer histology generated by adversarial networks - synopsis

There are methods to improve biomarker classifier performance, but what variables affect model accuracy?

I studied this a bit for estrogen receptor status in breast cancer. We calculated model performance separately for low and high grade tumors, demonstrating that our models were more accurate for low grade tumors.

Naik et al. have taken this idea a step further. They calculated model accuracy for a number of different divisions of histological and clinical variables [2]. Some affected model performance significantly more than others.

This method of analyzing model performance is critical to understanding how well it generalizes and where improvements could be made in the future.

[2] Deep learning-enabled breast cancer hormonal receptor status determination from base-level H&E stains - synopsis

Beyond improvements in model performance, I’m also particularly intrigued by new insights into what tissue properties are enabling these predictions.

Diao et al. combined deep learning with human-interpretable features [3]. They classified tissue and cell types with deep learning, then extracted features from each. Unlike class heatmaps, these features can be aggregated across many images and mapped to biological concepts.

For example, the plots below show which feature and tissue types are associated with the PD-1 biomarker.

[3] Dense, high-resolution mapping of cells and tissues from pathology images for the interpretable prediction of molecular phenotypes in cancer - synopsis

Deep learning has the ability to capture much more complex features than the best trained human experts. It has opened up new opportunities for quantifying clinically-relevant histological properties.

What to learn more about the latest advancements in deep learning for molecular biomarkers?

Check out this article that I wrote for the Digital Pathology Association:
Deep learning-based histology biomarkers: Recent advances and challenges for clinical use

You’ll learn about the types of molecular properties that have been successfully predicted from H&E alone and what challenges we still face in turning this technology into a clinical application.

Hope that you’re finding Pathology ML Insights informative. Look out for another edition in two weeks.

Heather