Research: Foundation Models & Batch Effects

Current Pathology Foundation Models are unrobust to Medical Center Differences

Do pathology foundation models focus more on medical center differences or the biology of the tissue? For the models to be robust and reliable, we certainly hope the latter.

Medical centers influence how foundation models interpret pathology images, as these models learn to recognize differences between centers alongside biological information.

While some argue this is expected or could be corrected post-hoc, Edwin D. de Jong et al. believe these features are not truly separable—center-specific patterns likely intertwine with medically relevant data.

This raises concerns about whether apparent diagnostic accuracy is actually based on center-specific confounders rather than true biological patterns, potentially limiting generalization to new medical centers.

Edwin D. de Jong et al. introduced the robustness index and used it to evaluate to what degree biological features dominate confounding features with 10 different pathology foundation models.

"Foundation models were seen to differ significantly in robustness according to this metric. Uni2-h and Virchow2 were found to be most robust, and Virchow2 was the only model so far with a robustness index above one, meaning biological information (cancer type) dominates confounding information (medical center)."

Those building and using foundation models need to be aware of this potential bias.

Webinar: Computer Vision Challenges

Three Critical Mistakes Derailing Your Computer Vision Projects

Excited to share insights from my recent webinar entitled “Three Critical Mistakes Derailing Your Computer Vision Projects”

In this talk, I explored:
- A common process flaw that introduces bias you might not have considered
- The overlooked step that leaves you flying blind on performance
- A subtle error in data handling that can invalidate your entire model

Some key takeaways:
- A consistent annotation process is essential to ensure model reliability
- Starting with a baseline model enables you to measure progress and identify data issues sooner
- Proper data splitting to prevent data leakage ensures that your models will generalize to unseen data

Curious to learn more? Check out the slides for a visual summary!

Blog: Pathology AI

What happened to pathology AI companies?

There has been a lot going on in the world of AI. But each application area is different.

To be successful, you really need to understand how AI will fit in with the existing workflow and how it'll solve current bottlenecks or challenges.

Abhishaike Mahajan recently wrote about the difficulties facing pathology AI companies.

There are definitely obstacles to clinical applications; regulatory pathways and reimbursement are some of the largest ones.

This is why many companies focus on supporting drug development instead.

Abhishaike alluded to another use case at the end of the article: virtual staining. This can translate between different stain types or create pseudo-H&E from label-free images.

Further, there are a number of companies trying to create alternative approaches to H&E histology that don't destroy tissue and can work in a point-of-care setting.

Some of them are working on diagnostic models from a variety of imaging modalities, many of them multispectral or 3D.

There may be a future where tissue samples don't need to be sectioned or stained -- and AI is supporting this path too.

Insights: Distribution Shift

Tackling Concept Shift in Agricultural ML Models

During my recent webinar on distribution shift, I explored a fascinating question relevant to anyone working with geographically diverse datasets:

Q: What is a good method to tackle concept shifts? For example, in crop yield forecasting, different regions have different management practices established by local farmers. How do you account for factors that cannot be numerically quantified when building models?

This question highlights a fundamental challenge in deploying ML across diverse environments - when the relationship between inputs and outputs changes.

𝐖𝐡𝐲 𝐎𝐧𝐞-𝐒𝐢𝐳𝐞-𝐅𝐢𝐭𝐬-𝐀𝐥𝐥 𝐌𝐨𝐝𝐞𝐥𝐬 𝐎𝐟𝐭𝐞𝐧 𝐅𝐚𝐢𝐥
When dealing with concept shifts across agricultural regions:
- Identical weather conditions may produce different yields due to local farming practices
- Similar soil properties may be managed differently based on regional norms
- The causal relationship between measurable features and outcomes varies by location
This explains why universal models often underperform when deployed across regions with different agricultural practices.

𝐄𝐟𝐟𝐞𝐜𝐭𝐢𝐯𝐞 𝐒𝐭𝐫𝐚𝐭𝐞𝐠𝐢𝐞𝐬 𝐟𝐨𝐫 𝐑𝐞𝐠𝐢𝐨𝐧𝐚𝐥 𝐀𝐝𝐚𝐩𝐭𝐚𝐭𝐢𝐨𝐧
Based on my experience, here are approaches that show promise:
1. Hierarchical Fine-Tuning
Start with a base model trained on all available data, then fine-tune separate versions for each region. This preserves general patterns while adapting to local relationships.
2. Ensemble Methods with Regional Weighting
Train individual models for different regions, then apply them as a weighted ensemble when predicting for new regions. This allows your system to emphasize the most relevant regional patterns.
3. Foundation Models as Starting Points
For image-based agriculture (satellite, drone, or ground-based), leveraging pre-trained foundation models and then fine-tuning for specific regions can capture nuanced visual patterns that correlate with local practices.

𝐓𝐡𝐞 𝐊𝐞𝐲 𝐈𝐧𝐬𝐢𝐠𝐡𝐭
The most important takeaway: localization matters. The concept shift between regions typically requires customized models rather than a single universal solution.

What concept shifts have you encountered in your own domain? How did you address the challenge of transferring models across environments with different underlying relationships?

_{Enjoy this newsletter? Here are more things you might find helpful:}

_{Office Hours -- Are you a student with questions about machine learning for pathology or remote sensing? Do you need career advice? Once a month, I'm available to chat about your research, industry trends, career opportunities, or other topics.
Register for the next session}

Did someone forward this email to you, and you want to sign up for more? Subscribe to future emails
This email was sent to _t.e.s.t_@example.com. Want to change to a different address? Update subscription
Want to get off this list? Unsubscribe
My postal address: Pixel Scientia Labs, LLC, PO Box 98412, Raleigh, NC 27624, United States