Webinar: Distribution Shift

Excited to share insights from my recent webinar on Disentangling Distribution Shift!

In this talk, I explored:
- Why distribution shift is a critical challenge for CV models
- 4 key types of distribution shift affecting real-world applications
- Four techniques for identifying a distribution shift
- Different approaches to detecting and mitigating shift, from simple to complex

Some key takeaways:
- Distribution shift can significantly impact model performance
- Continuous monitoring and adaptation are crucial for maintaining model reliability
- Techniques like data augmentation and domain adaptation can improve model robustness

Curious to learn more? Check out the slides for a visual summary!

Research: Foundation Models & Distribution Shift

Evaluating Computational Pathology Foundation Models for Prostate Cancer Grading under Distribution Shifts

Variations in stain colors across different labs and scanners commonly challenge machine learning models trained on H&E whole slide images.

Foundation models are reported to handle distribution shifts like this better, but do they actually solve the problem?

Fredrik K. Gustafsson and Mattias Rantalainen assess two pathology foundation models, UNI and CONCH, on prostate cancer grading.

These three different models were used as feature extractors and paired with three different multiple instance learning approaches.

Both significantly out-performed a baseline ImageNet-pretrained model, however they still experienced a significant drop in performance when applied to slides from a different hospital.

"The fact that UNI and CONCH have been trained on very large and varied datasets does not guarantee that downstream prediction models always will be robust to commonly encountered distribution shifts."

The authors further emphasized that the quality of training data for downstream models really matters.

Research: Foundation Models

A Vision-Language Foundation Model for Leaf Disease Identification

General-purpose vision-language foundation models are able to tackle a very diverse set of tasks. But they are generally not well-suited to fine-grained image classification.

Khang Nguyen Quoc et al. developed a vision-language model specifically for identifying leaf diseases.

Using 186k image-caption pairs of 97 concepts, they trained a foundation model using contrastive learning.

The captions were generated to include the class name of each image and a description of that class. An ablation study demonstrated that this extra context significantly improved performance.

In validating on 10 out-of-distribution datasets, their model outperformed others on zero-shot, few-shot, image-text retrieval, and image classification tasks.

Insights: Distribution Shift

When Simple Normalization Isn't Enough: Tackling Distribution Shift in ML Models

During my recent webinar on machine learning challenges, an interesting question came up about distribution shift:

Q: If we use the mean and standard deviation of the source sets and apply it to the target set, will it tackle the distribution shift?

This question gets at the heart of a common approach many practitioners try first - simple statistical normalization. Let me share some expanded thoughts on when this works and when we need more sophisticated approaches.

𝐓𝐡𝐞 𝐒𝐢𝐦𝐩𝐥𝐞 𝐀𝐩𝐩𝐫𝐨𝐚𝐜𝐡: 𝐒𝐭𝐚𝐭𝐢𝐬𝐭𝐢𝐜𝐚𝐥 𝐍𝐨𝐫𝐦𝐚𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧
For basic scenarios, applying normalization (subtracting mean, dividing by standard deviation) can help when:
- Working with tabular data
- Dealing primarily with intensity shifts in images
- Addressing simple scale differences between datasets
This standardization technique is intuitive and computationally efficient, making it an appealing first solution.

𝐖𝐡𝐲 𝐑𝐞𝐚𝐥-𝐖𝐨𝐫𝐥𝐝 𝐒𝐡𝐢𝐟𝐭𝐬 𝐀𝐫𝐞 𝐔𝐬𝐮𝐚𝐥𝐥𝐲 𝐌𝐨𝐫𝐞 𝐂𝐨𝐦𝐩𝐥𝐞𝐱
However, most practical distribution shifts involve:
- Complex, multidimensional transformations
- Non-linear relationships between features
- Structural differences beyond simple intensity changes
Simply normalizing raw inputs often fails to capture these nuanced differences between source and target domains.

𝐀 𝐌𝐨𝐫𝐞 𝐄𝐟𝐟𝐞𝐜𝐭𝐢𝐯𝐞 𝐀𝐥𝐭𝐞𝐫𝐧𝐚𝐭𝐢𝐯𝐞: 𝐓𝐞𝐬𝐭-𝐓𝐢𝐦𝐞 𝐀𝐝𝐚𝐩𝐭𝐚𝐭𝐢𝐨𝐧
One promising approach I've seen applied successfully is test-time adaptation of internal network activations:
1. Process your target data through the network
2. Calculate the mean and standard deviation of activations at various network layers
3. Normalize these activations during inference
This approach adjusts representations at multiple levels of abstraction, rather than just normalizing inputs. It acknowledges that distribution shifts manifest differently at different levels of feature representation.

𝐓𝐡𝐞 𝐊𝐞𝐲 𝐈𝐧𝐬𝐢𝐠𝐡𝐭
The most important takeaway is that while simple normalization can be a useful starting point, distribution shifts in real-world ML applications typically require more sophisticated approaches that address shifts at multiple levels of representation.

_{Enjoy this newsletter? Here are more things you might find helpful:}

Team Workshop: Harnessing the Power of Foundation Models for Pathology - Ready to unlock new possibilities for your pathology AI product development? Join me for an exclusive 90 minute workshop designed to catapult your team’s model development.

Schedule now

Did someone forward this email to you, and you want to sign up for more? Subscribe to future emails
This email was sent to _t.e.s.t_@example.com. Want to change to a different address? Update subscription
Want to get off this list? Unsubscribe
My postal address: Pixel Scientia Labs, LLC, PO Box 98412, Raleigh, NC 27624, United States