Why Most AI Pilots Stall — and What We’re Still Missing

The MIT State of AI in Business report grabbed attention with a bold claim: 95 percent of corporate AI pilots fail. It’s an alarming number — and one that captures a kernel of truth many leaders recognize. But the story behind that statistic deserves a closer look.

Most of the report focuses on generative AI — marketing, automation, and customer engagement. Important, yes. But it leaves a gap in understanding how AI performs in domains where the stakes are higher: health, climate, and infrastructure.

Take a more grounded example. An agriculture company built a model to detect early signs of crop stress from drone imagery. It worked beautifully in one region — but failed when applied elsewhere. Different soils, lighting, and crop types exposed what the validation metrics had hidden. Gathering data from more geographies could help, but scaling that diversity is far harder than it sounds.

That’s what “failure” often looks like in the real world — not a broken model, but one that never learned what really mattered.

What the Report Missed

Here’s where the story gets interesting. The question isn’t whether the number is right — it’s what it leaves out. Most studies today measure if AI projects succeed, not why they fail or how to succeed. They emphasize P&L impact, user adoption, or time to value — valid metrics for GenAI, but incomplete for high-stakes, domain-specific AI.

Consider three examples:

Detecting cancer from pathology slides across different scanners.
Monitoring deforestation or air quality with satellite or sensor networks.
Predicting battery performance and lifespan across manufacturers or chemistries.

These models don’t just need to be accurate; they must be trustworthy, transferable, and testable — across environments that never look the same twice.

That’s where today’s business surveys fall silent. They can tell us the success rate. They can’t tell us what success looks like.

The Questions That Still Need Answers

If you’ve led an AI project that stalled, these stories will sound familiar. A pathology model that performs beautifully on one hospital’s slides, then fails in another. A satellite model tuned to one season that struggles with cloud cover in the next. A predictive maintenance system that flags the wrong patterns when the equipment ages.

What’s missing isn’t hype or hardware — it’s oversight, collaboration, and clarity.

The State of Impactful AI Survey aims to uncover those patterns systematically. It explores questions like:

When AI projects fall short, what are the real-world consequences: rework, cost overruns, regulatory setbacks, or loss of trust?
Which factors for success — data quality, validation across sites, or workflow integration — are most often underestimated?
How do organizations balance accuracy, robustness, accountability, and sustainability when building for the real world?

These are the insights today’s AI business reports can’t provide — and the ones this survey is designed to reveal.

Introducing the State of Impactful AI Survey

This is your chance to help shape the first cross-industry benchmark for impactful AI.

That’s why I’m launching the State of Impactful AI Survey — not to count failures, but to understand the mechanisms of success.

This is a data-driven effort to map:

How teams are building AI that holds up under real-world variability.
What practices separate robust systems from fragile pilots.
Where oversight, validation, and domain expertise make the biggest difference.

If you’ve seen a model succeed — or struggle — your experience can help the field move forward.

Participants will receive:

Early access to the survey results
Entry into a drawing for a $250 Amazon gift card as a thank-you for contributing

👉 Take the State of Impactful AI Survey

The survey is open now and takes less than 10 minutes to complete. Please share it with peers in your industry — the more diverse the perspectives, the clearer the patterns we’ll uncover.

Together, we can build a clearer picture of what makes AI impactful — and where it still falls short.

Because impactful AI isn’t about hype — it’s about what survives the real world.
It’s not about claiming 95 percent fail.
It’s about learning why — and doing better next time.

- Heather

Vision AI that bridges research and reality

— delivering where it matters

Research: Multi-model and Multi-spectral FM

Towards Scalable Foundation Model for Multi-modal and Multi-spectral Geospatial Data

Processing thousands of spectral channels efficiently has been a major bottleneck for geospatial foundation models. Haozhe Si et al. tackled this computational challenge with an architecture designed specifically for multi-spectral satellite data.

𝗧𝗵𝗲 𝗰𝗼𝗺𝗽𝘂𝘁𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗽𝗿𝗼𝗯𝗹𝗲𝗺: Recent work has adapted existing self-supervised learning approaches for such geospatial data. However, they fall short of scalable model architectures, leading to inflexibility and computational inefficiencies when faced with an increasing number of channels and modalities.

𝗪𝗵𝘆 𝘁𝗵𝗶𝘀 𝗺𝗮𝘁𝘁𝗲𝗿𝘀: Satellite imagery can contain hundreds to thousands of spectral bands, each capturing different wavelengths of electromagnetic radiation. Current vision transformers scale quadratically with the number of channels, making them impractical for hyperspectral data that's increasingly common in Earth observation.

𝗞𝗲𝘆 𝗶𝗻𝗻𝗼𝘃𝗮𝘁𝗶𝗼𝗻𝘀 𝗶𝗻 𝗟𝗘𝗦𝗦 𝗩𝗶𝗧:
- 𝗟𝗶𝗻𝗲𝗮𝗿 𝘀𝗰𝗮𝗹𝗶𝗻𝗴: The computational complexity of LESS ViT is reduced to linear in the number of spatial-spectral tokens instead of quadratically
- 𝗣𝗵𝘆𝘀𝗶𝗰𝘀-𝗶𝗻𝗳𝗼𝗿𝗺𝗲𝗱 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀: Continuous positional-channel embeddings that encode both geographic distances and spectral wavelengths, enabling the model to handle arbitrary spectral bands
- 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 𝗱𝗲𝗰𝗼𝗺𝗽𝗼𝘀𝗶𝘁𝗶𝗼𝗻: Uses Kronecker products to approximate full spatial-spectral attention without explicitly constructing massive attention matrices
- 𝗠𝘂𝗹𝘁𝗶-𝘀𝗽𝗲𝗰𝘁𝗿𝗮𝗹 𝗺𝗮𝘀𝗸𝗲𝗱 𝗮𝘂𝘁𝗼𝗲𝗻𝗰𝗼𝗱𝗲𝗿: Multi-MAE employs decoupled spatial and spectral masking to create a more challenging self-supervised pretraining objective

𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗮𝗻𝗱 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆: The model achieves competitive performance against state-of-the-art approaches while demonstrating superior cross-satellite generalization. On computational efficiency metrics, LESS ViT maintains the lowest parameter count and reasonable inference times compared to alternatives that explicitly model spatial-spectral attention.

This work represents a step toward foundation models that can efficiently handle the full spectral richness of Earth observation data without computational compromises.

Research: Robust FMs

fmMAP: A Framework Reducing Site-Bias Batch Effect from Foundation Models in Pathology

Foundation models in pathology face a persistent challenge: site-specific batch effects that limit their ability to generalize across hospitals. A new framework offers a practical solution.

Research from Hai Cao Truong Nguyen et al. introduces fmMAP (Foundation Model-based Manifold Approximation Pipeline), a method designed to reduce site-bias batch effects in pathology foundation models without requiring full model retraining. The approach addresses a critical bottleneck: even state-of-the-art foundation models encode institutional signatures that can compromise downstream task performance.

Why site bias matters: When pathology images come from different medical centers, they carry technical variations—differences in tissue preparation, staining protocols, scanner characteristics, and imaging workflows. Foundation models inadvertently learn these institutional fingerprints, which then propagate to every downstream application built on top of them. The result? Models that work well at one hospital but stumble at another.

The fmMAP approach:
- Works directly on foundation model representations rather than requiring access to raw training data or model retraining
- Uses a maximum a posteriori framework to adjust embeddings and reduce site-specific variance while preserving biological signal
- Demonstrated effectiveness across multiple downstream tasks, showing improved cross-site generalization
- Provides a computationally efficient alternative to full model retraining or complex data preprocessing pipelines

What makes this practical: Unlike methods that require modifying the foundation model training process or accessing proprietary datasets, fmMAP can be applied post-hoc to existing models. This makes it particularly valuable for institutions that rely on pre-trained foundation models but need to ensure reliable performance across their own diverse data sources.

The broader implication: As pathology AI moves toward clinical deployment, we need tools that bridge the gap between research performance and real-world reliability. Methods like fmMAP represent an important step toward foundation models that truly generalize—not just within carefully curated datasets, but across the messy, heterogeneous reality of clinical practice.

Code

_{Enjoy this newsletter? Here are more things you might find helpful:}

_{Office Hours -- Are you a student or young professional with questions about machine learning for pathology or remote sensing? Do you need career advice? Once a month, I'm available to chat about your research, industry trends, career opportunities, or other topics.
Register for the next session}

Did someone forward this email to you, and you want to sign up for more? Subscribe to future emails
This email was sent to _t.e.s.t_@example.com. Want to change to a different address? Update subscription
Want to get off this list? Unsubscribe
My postal address: Pixel Scientia Labs, LLC, PO Box 98412, Raleigh, NC 27624, United States