Last Month's Webinar

I'm excited to share insights from my recent webinar on Demystifying Foundation Models for Pathology!

In this talk, I explored:
• Why foundation models are game-changers for computational pathology
• 4 key scenarios where these models excel
• Different approaches to leveraging foundation models, from simple to complex
• The evolution from tile-based to whole-slide and multimodal foundation models

Some key takeaways:
• Foundation models can significantly improve results for weakly supervised tasks and scenarios with limited labeled data
• They're a top strategy for handling distribution shifts in pathology images
• The field is rapidly evolving, with new models and applications emerging frequently

Curious to learn more? Check out the slides for a visual summary!

Podcast: Impact AI

Foundation Model Series: Democratizing Time Series Data Analysis with Max Mergenthaler Canseco from Nixtla

What if the hidden patterns of time series data could be unlocked to predict the future with remarkable accuracy? In this episode of Impact AI, I sit down with Max Mergenthaler Canseco to discuss democratizing time series data analysis through the development of foundation models. Max is the CEO and co-founder of Nixtla, a company specializing in time series research and deployment, aiming to democratize access to advanced predictive insights across various industries.

In our conversation, we explore the significance of time series data in real-world applications, the evolution of time series forecasting, and the shift away from traditional econometric models to the development of TimeGPT. Learn about the challenges faced in building foundation models for time series and a time series model’s practical applications across industries. Discover the future of time series models, the integration of multimodal data, scaling challenges, and the potential for greater adoption in both small businesses and large enterprises. Max also shares Nixtla’s vision for becoming the go-to solution for time series analysis and offers advice to leaders of AI-powered startups.

Research: Generalizability

Generalizability of Machine Learning Models: Quantitative Evaluation of Three Methodological Pitfalls

Lack of generalizability is still a challenge that plagues many machine learning models.

Farhad Maleki et al. explored 3 common pitfalls:
1) Violation of independence assumption
2) Model evaluation with an inappropriate performance indicator or baseline for comparison
3) Batch effect

The independence assumption can be broken by applying oversampling, data augmentation, or feature selection before splitting the data into training, validation, and tests. Or by using several data points from a patient and distributing them across the three splits of data.

They set up experiments to evaluate each of these pitfalls on medical imaging datasets.

Catching these problems cannot be done using internal model evaluations alone. External test sets are essential, as is understanding and avoiding these pitfalls from the start.

Insights: Foundation Model Training

Training Foundation Models for Digital Pathology: The Art of Data Curation

A question from my recent webinar: If you want to train a foundation model from scratch with down-stream tasks in mind (let’s say to identify high-grade, low-grade tumor, or invasion) and the first step is to extract tile patches from your WSI, is it important to provide a balanced number of tiles for each class?

When developing a self-supervised foundation model, the question of data balance arises. Here's why it's more nuanced than you might think:

Self-supervised Learning: Diversity is Key

1. Data Variety Trumps Class Balance
- For SSL, exposing the model to a wide range of image characteristics is crucial.
- This includes variations in tissue types, staining, scanners, and biological diversity.

2. Quantity with Quality
- Large-scale datasets are essential for robust SSL models.
- The goal is to capture the full spectrum of visual patterns in histopathology.

Key Considerations for Foundation Model Training

- Scanner Diversity: Include images from various scanning devices to improve generalization.
- Staining Variations: Expose the model to different H&E staining protocols and qualities.
- Biological Variability: Represent different patient demographics.
- Tissue Representation: Ensure a wide range of normal and pathological tissue types are included.

The Bottom Line

While class balance is crucial for supervised fine-tuning, SSL foundation model training benefits more from diverse, large-scale datasets that represent the full complexity of whole slide images. Plus, your foundation model will likely be used for more than one task!

💡 Pro Tip: Document your data curation process meticulously. Understanding what went into your foundation model is key for downstream applications and potential biases.

_{Enjoy this newsletter? Here are more things you might find helpful:

1 Hour Strategy Session -- What if you could talk to an expert quickly? Are you facing a specific machine learning challenge? Do you have a pressing question? Schedule a 1 Hour Strategy Session now. Ask me anything about whatever challenges you’re facing. I’ll give you no-nonsense advice that you can put into action immediately.}
_{Schedule now}

Did someone forward this email to you, and you want to sign up for more? Subscribe to future emails
This email was sent to _t.e.s.t_@example.com. Want to change to a different address? Update subscription
Want to get off this list? Unsubscribe
My postal address: Pixel Scientia Labs, LLC, PO Box 98412, Raleigh, NC 27624, United States