Scalable AI: Bridging the Deployment Gap

An AI model that works in a controlled environment is fundamentally different from one operating in the real world. Corey Jaskolski at Synthetaic puts it bluntly: traditional AI models often take months to build and cost millions, yet industry data suggests an 83% failure rate for getting them into production. The gap between a promising demo and a reliable product isn't just a technical problem — it's a strategic one.

Scalability is what bridges that gap. But scaling AI isn't like scaling conventional software. Software fails loudly, with crashes and error logs. AI fails quietly, as the relationship between data and the world slowly shifts. Junaid Kalia at NeuroCare.AI found that their vision models began losing sensitivity and specificity after processing roughly 1.5 million images — degradation that's nearly impossible to detect in real time. Dave DeCaprio at ClosedLoop adds another layer of complexity: in healthcare, you often can't measure accuracy until a year after the fact, making proactive monitoring of feature drift essential rather than optional. Understanding why AI fails at scale starts with understanding what scaling actually demands.

The Axes of Scale

Not every AI system needs to scale across all six dimensions — a clinical decision support tool deployed within a single hospital system faces different scaling pressures than a global satellite monitoring platform. The relevant axes depend on the problem. What matters is identifying which dimensions are load-bearing for your specific context before you hit the wall. For most production systems, at least three or four apply — and underestimating even one is enough to stall a deployment.

Technical scalability means the system performs as reliably for the millionth request as for the first. David Golan at Viz.ai describes analyzing a patient scan every 28 seconds across thousands of hospitals — throughput that requires parallel processing architectures, not the serial human review pipelines AI is meant to replace. Amr Omar at Precision AI pushes this to the edge: drones must process high-resolution images and make spraying decisions in milliseconds, with no internet connectivity.

Economic scalability means the unit economics hold as volume grows. Ranveer Chandra at Microsoft Research makes the point plainly: if the compute exceeds the value generated, the model is a vanity project. Processing high-resolution satellite data daily is economically unviable if the scene hasn't changed — optimization to process only when necessary isn't a nice-to-have, it's what keeps the system viable. Zelda Mariet at Bioptimus adds that foundation models carry enormous GPU costs that must be built into the business model from the start, not discovered later.

Data scalability requires abandoning manual processes. Amanda Marrs at AMP Robotics processes over 50 billion objects per year using real-world production data to continuously improve their neural networks — no manual labeling at that volume. Ankur Garg at BlocPower automated ingestion of utility data via APIs and flat files into a data lake to build digital twins of 130 million buildings. At that scale, human data curation simply isn't an option.

Operational scalability asks whether the AI fits into existing workflows or requires users to change their behavior. Jeff Chang at Rad AI frames this as "zero change to the existing workflow." Coleman Stavish at Proscia observed that technically excellent AI papers failed in practice because the technology wasn't introduced correctly into the pathology lab's existing processes. The wrapper matters as much as the model.

Geographic and population scalability is where hidden biases surface. Hamed Alemohammad at Clark University describes the difficulty of transferring crop models from data-rich regions like the US to data-scarce developing countries due to domain shifts in agricultural practices and climate. Dean Freestone at Seer notes that systems developed in urban academic institutions often fail to translate to global populations, introducing systematic bias at scale. The answer isn't a single universal model — it's a strong global baseline with deliberate regional fine-tuning, an approach Bruno Sánchez-Andrade Nuño at Clay and Indra den Bakker at Overstory both employed.

Regulatory and trust scalability governs how fast you can actually deploy at scale. Ersin Bayram at Perimeter Medical Imaging AI notes that in regulated industries, you must have traceability, revision control, and design controls — you cannot simply push a new model. Emi Gal at Ezra reframes FDA clearance as an asset: it's a forcing function that builds validation processes from day one, rather than retrofitting them later.

Knowing the axes is necessary but not sufficient. The harder question is what happens when teams underestimate them — and the answer is usually the same: they scale too fast, too soon.

Why Premature Scaling Kills Companies

Scale reveals what pilots conceal. Harro Stokman at Kepler Vision found their software initially failed in real-world settings due to edge cases — a hat on a wall triggering a patient fall alert, statues confusing the system — problems that only became visible after collecting over a million field examples. The long tail doesn't show up in the pilot; it shows up when the world starts throwing things at your model that your training data never anticipated.

Premature scaling also fails economically. Freestone observed that many healthcare AI companies no longer exist because they tried to do too much too quickly and underestimated the cost of building secure, compliant infrastructure. Joe Brew at Hyfe went to market with a product that caused phones to overheat and generated thousands of false positives — a data collection strategy that worked, but required an intense and immediate feedback loop to survive.

The responsible path is deliberate expansion. Manal Elarab at Regrow Ag trains core models in data-rich environments like the US or Europe, then retrains for new geographies using smaller local datasets. Benji Meltzer at Aerobotics built a yield estimation model for citrus on 10,000 datasets, then scaled to apples — a completely different crop — using only 1,000 calibration points. Mastery in a data-rich context enables efficiency in a data-scarce one. But deliberate expansion also requires the right technical foundation underneath it.

The Engineering Toolkit

Scalable AI systems share a common technical foundation. Containerization is the baseline: Kit Merker at Plainsight describes a dockerized platform that standardizes computer vision components, deployable into Kubernetes in the cloud or at the edge interchangeably — treating the model almost like a configuration file within a standard application lifecycle.

At the edge, where bandwidth is limited and latency is critical, model compression becomes essential. Merker notes that edge deployment often rules out large models and cloud connectivity entirely, requiring smaller fine-tuned models that run on constrained devices. Stokman at Kepler Vision sidesteps bandwidth and privacy constraints by converting video to text on the edge device itself, sending only a text string to the server rather than raw footage.

Underpinning all of it is data engineering. Gard Hauge at StormGeo estimates that 80–85% of the work at scale is data ops — and without that foundation, the algorithms don't matter. Infrastructure gets the system running at scale; keeping it running is a different problem entirely.

Monitoring and Retraining at Scale

A deployed model isn't finished — it's the beginning of an ongoing maintenance problem. Gershom Kutliroff at Taranis runs a continuous learning framework where production data that doesn't match the training distribution is filtered, used to retrain the model, and pushed back into deployment. DeCaprio at ClosedLoop emphasizes that an MLOps pipeline must actively monitor for feature drift and shifts in outcome distributions — COVID-19, for instance, invalidated models that had no way of knowing the world had changed. For regulated environments where live retraining isn't permitted, Tobias Rijken at Kheiron Medical uses shadow models that sit behind the production system, observing real data without acting on it — building the evidence base needed for regulatory approval before any update goes live. The goal in each case is the same: a system that degrades on a schedule you control, not one that fails silently while you're looking elsewhere.

The Real KPI

All of this — the axes, the infrastructure, the monitoring loops — is in service of a single outcome: reach. David Golan at Viz.ai defines the ultimate performance indicator not as technical accuracy, but as the percentage of patients around the world touched by the AI system — with a stated goal of 100% saturation. Stokman frames it differently but arrives at the same place: the true indicator of impact is successfully scaling from 20 beds to 2,000 beds without a drop in reliability.

Getting there requires infrastructure, governance, automated monitoring, and a deliberate expansion strategy. A model that works for the first thousand users but degrades at a million isn't impactful — it's a prototype that outlasted its welcome.

- Heather

Vision AI that bridges research and reality

— delivering where it matters

Research: Data Efficiency

GenBio-PathFM: A State-of-the-Art Foundation Model for Histopathology

For years, advancing pathology AI meant one thing: hoarding massive, proprietary datasets to train ever-larger models. But a new paper suggests the field might be hitting diminishing returns on simply adding more data.

Histopathology data is highly redundant. The vast majority of tissue patches look similar, creating a long-tailed distribution where rare, diagnostically critical features get drowned out by common morphologies. Consequently, brute-force scaling is highly inefficient.

Saarthak Kapse et al. released 𝙂𝙚𝙣𝘽𝙞𝙤-𝙋𝙖𝙩𝙝𝙁𝙈, a 1.1 billion parameter open-weight foundation model that challenges the "more is better" dogma. Despite using only 10% to 20% of the training data required by current leading proprietary models (like Virchow2 or UNI2), it achieves state-of-the-art results across clinical, molecular, and robustness benchmarks.

Here is how they achieved unprecedented data efficiency:

• 𝘼𝙪𝙩𝙤𝙢𝙖𝙩𝙚𝙙 𝘿𝙖𝙩𝙖 𝘾𝙪𝙧𝙖𝙩𝙞𝙤𝙣: Instead of using every available WSI, the team built a fully unsupervised pipeline using hierarchical clustering and stratified sampling to select tiles based on morphological diversity. This approach filters out redundant patterns, allowing the model to train on just 177k public WSIs while prioritizing high-entropy content like rare histological variants.

• 𝙏𝙝𝙚 𝙅𝙀𝘿𝙄 (𝙅𝙀𝙋𝘼 + 𝘿𝙄𝙉𝙊) 𝙋𝙧𝙚𝙩𝙧𝙖𝙞𝙣𝙞𝙣𝙜 𝙍𝙚𝙘𝙞𝙥𝙚: Standard DINO pretraining captures good global morphology, but the authors went a step further by introducing a novel dual-stage strategy. After an initial DINO phase, they froze the encoder and used it as a teacher for a JEPA-based student. By tasking the student with predicting visible regions and outpainting missing ones, the model learned highly fine-grained, spatially-aware representations without relying on raw pixel reconstruction.

• 𝙎𝙩𝙖𝙩𝙚-𝙤𝙛-𝙩𝙝𝙚-𝘼𝙧𝙩 𝙍𝙤𝙗𝙪𝙨𝙩𝙣𝙚𝙨𝙨: When tested on the PathoROB benchmark (which measures resilience to varying scanners and stains across multi-center datasets), 𝙂𝙚𝙣𝘽𝙞𝙤-𝙋𝙖𝙩𝙝𝙁𝙈 established a new state-of-the-art average Robustness Index of 0.888, significantly outperforming much larger, data-heavy models like Virchow2 and UNI2.

𝙏𝙝𝙚 𝙩𝙖𝙠𝙚𝙖𝙬𝙖𝙮: The future of clinical AI is not just about who has the biggest private dataset. Intelligent data curation and optimized learning objectives can match or exceed the performance of unconstrained scaling, offering a path to more accessible, transparent, and robust pathology AI.

Research: Spatial Biology

Taming the Wild West of Spatial Biology: New Benchmarks and Tools

Spatial transcriptomics promises to map the molecular composition of tissues with single-cell precision. However, the field is deeply fragmented right now.

Integrating high-resolution whole-slide images (WSIs) with spatial gene expression is difficult. High costs, severe data dropout, mismatched resolutions, and a lack of standard benchmarks have confined many models to narrow tasks and small cohorts.

Four new papers aim to change this by building the infrastructure for standardized, large-scale multimodal pathology. While two focus on dataset aggregation to establish benchmarks, the other two deliver software tools to solve the headaches of alignment and cell annotation.

Here is how they compare:

• 𝘿𝙖𝙩𝙖 𝘼𝙜𝙜𝙧𝙚𝙜𝙖𝙩𝙞𝙤𝙣 𝘼𝙣𝙙 𝘽𝙚𝙣𝙘𝙝𝙢𝙖𝙧𝙠𝙞𝙣𝙜: To solve the lack of standardization, 𝙃𝙀𝙎𝙏-1𝙠 is a collection linking 1,229 spatial transcriptomic profiles with WSIs across 26 organs, creating a benchmark library with 2.1 million expression-morphology pairs. Similarly tackling fragmentation, the 𝙎𝙥𝙖𝙍𝙀𝘿 benchmark compiles data from 26 public sources. While 𝙃𝙀𝙎𝙏-1𝙠 focuses on foundation model benchmarking, 𝙎𝙥𝙖𝙍𝙀𝘿 targets the issue of missing gene expression data, proposing a novel transformer-based completion technique to fix data corruption before inferring transcriptomic profiles from H&E.

• 𝙎𝙤𝙛𝙩𝙬𝙖𝙧𝙚 𝙁𝙤𝙧 𝘼𝙡𝙞𝙜𝙣𝙢𝙚𝙣𝙩: The gap between visual and molecular data formats is a major bottleneck. 𝙌𝙪𝙎𝙏 is an extension for QuPath that bridges WSIs and ST at the single-cell level. It uses image registration to align DAPI/H&E images with transcriptomic coordinates, enabling local cellular neighborhood analysis and the creation of deep learning-based WSI training sets.

• 𝙐𝙣𝙨𝙪𝙥𝙚𝙧𝙫𝙞𝙨𝙚𝙙 𝘾𝙚𝙡𝙡 𝘼𝙣𝙣𝙤𝙩𝙖𝙩𝙞𝙤𝙣: Even with perfectly aligned data, identifying cell types and states remains time-consuming and error-prone. 𝙏𝘼𝘾𝙄𝙏 is an unsupervised algorithm that assigns cell identities without requiring massive training datasets. It outperforms clustering methods by using unbiased thresholding to accurately deconvolve complex, ambiguous cell states across five large datasets totaling 5 million cells.

𝙏𝙝𝙚 𝙏𝙖𝙠𝙚𝙖𝙬𝙖𝙮: The next leap in computational pathology will not just come from building bigger foundation models. It requires rigorous benchmarking and accessible integration tools to standardize spatial biology workflows.

HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis

SpaRED benchmark: Enhancing Gene Expression Prediction from Histology Images with Spatial Transcriptomics Completion

Integrative whole slide image and spatial transcriptomics analysis with QuST and QuPath

Deconvolution of cell types and states in spatial multiomics utilizing TACIT

Research: Spatial Transcriptomics

Teaching AI to See Genes: The Power of Molecularly-Guided Pathology

Pathology foundation models excel at interpreting tissue morphology, but they lack understanding of the underlying molecular and genomic states. While Spatial Transcriptomics (ST) bridges this gap by mapping gene expression in situ, these technologies remain too expensive and resource-intensive for routine clinical application.

What if we could use ST and multi-omics data strictly during the training phase to teach models to recognize molecular mechanisms in H&E slides? Four new papers tackle this challenge.

• 𝙄𝙣𝙩𝙚𝙜𝙧𝙖𝙩𝙞𝙣𝙜 𝙈𝙤𝙡𝙚𝙘𝙪𝙡𝙖𝙧 𝘿𝙖𝙩𝙖 𝙄𝙣𝙩𝙤 𝙏𝙝𝙚 𝙏𝙧𝙖𝙞𝙣𝙞𝙣𝙜 𝙇𝙤𝙤𝙥: Three of the papers achieve molecular guidance by using omics data as a supervisory signal during training. 𝙋𝙖𝙩𝙝𝙇𝙐𝙋𝙄 uses the Learning Using Privileged Information paradigm with a dual-branch distillation framework, in which transcriptomic pathway signatures guide the visual encoder during training.

• 𝙏𝙤𝙠𝙚𝙣𝙨 𝙫𝙨. 𝘼𝙙𝙖𝙥𝙩𝙞𝙫𝙚 𝙍𝙚𝙜𝙞𝙤𝙣𝙨: 𝙈𝙄𝙉𝙏 incorporates spatial transcriptomics supervision directly into pretrained Vision Transformers during fine-tuning. It appends a learnable ST token to encode transcriptomic data separately from morphological features, avoiding catastrophic forgetting via self-distillation. Meanwhile, 𝘾𝘼𝙍𝙀 pushes molecular guidance beyond simple feature extraction. Instead of relying on rigid, square image patches, it leverages RNA and protein profiles during a cross-modal alignment stage to teach the model how to partition WSIs into morphologically relevant, irregular adaptive regions.

• 𝙋𝙤𝙨𝙩-𝙃𝙤𝙘 𝙈𝙚𝙘𝙝𝙖𝙣𝙞𝙨𝙩𝙞𝙘 𝙈𝙖𝙥𝙥𝙞𝙣𝙜: While the previous three models embed molecular data directly into the WSI encoder's weights, Reva Kulkarni et al. demonstrate an alternative framework for post-hoc mechanistic discovery. They first train an attention-MIL classifier on WSIs to identify tissue regions associated with patient outcomes, such as trastuzumab resistance in HER2+ breast cancer. They then map those morphological patterns to a separate ST dataset, successfully identifying the underlying molecular drivers within those tumor regions.

𝙏𝙝𝙚 𝙏𝙖𝙠𝙚𝙖𝙬𝙖𝙮: The future of precision oncology does not require physical sequencing of every patient slide. By using multi-omics as a privileged teacher during training or as a mapped reference for discovery, we can extract deep, genome-anchored insights directly from H&E.

Genome-Anchored Foundation Model
Embeddings Improve Molecular Prediction
from Histology Images

Genome-Anchored Foundation Model Embeddings Improve Molecular Prediction from Histology Images

MINT: Molecularly Informed Training with Spatial Transcriptomics Supervision for Pathology Foundation Models

CARE: A Molecular-Guided Foundation Model with Adaptive Region Modeling for Whole Slide Image Analysis

A multimodal framework to identify molecular mechanisms driving patient group-associated morphology through the integration of spatial transcriptomics and whole slide imaging

Research: Multimodal

Bridging Radiology and Pathology Foundation Models via Concept-Based Multimodal Co-Adaptation

In clinical practice, a complete patient diagnosis requires understanding both the macroscopic structure (radiology) and the microscopic cellular details (pathology). However, current multimodal models often struggle to truly integrate these two distinct domains.

The standard approach, late fusions, extracts static feature vectors from each modality and simply concatenates them. This creates a black-box prediction that lacks clinical interpretability and fails to capture the dynamic relationship between a patient's scans and their tissue slides.

A new paper by 𝙔𝙞𝙝𝙖𝙣𝙜 𝘾𝙝𝙚𝙣 𝙚𝙩 𝙖𝙡. introduces a highly elegant alternative: the 𝘾𝙤𝙣𝙘𝙚𝙥𝙩 𝙏𝙪𝙣𝙞𝙣𝙜 𝙖𝙣𝙙 𝙁𝙪𝙨𝙞𝙣𝙜 (𝘾𝙏𝙁) framework. Rather than fusing arbitrary numbers, CTF uses clinically grounded concepts (like "tumor necrosis" or "cellular atypia") as a shared semantic interface.

Here are the key innovations:

• 𝘾𝙧𝙤𝙨𝙨-𝘿𝙤𝙢𝙖𝙞𝙣 𝘾𝙤-𝙖𝙙𝙖𝙥𝙩𝙖𝙩𝙞𝙤𝙣: The core breakthrough is the 𝙂𝙡𝙤𝙗𝙖𝙡-𝘾𝙤𝙣𝙩𝙚𝙭𝙩-𝙎𝙝𝙖𝙧𝙚𝙙 𝙋𝙧𝙤𝙢𝙥𝙩 (𝙂𝘾𝙎𝙋) mechanism. CTF does not treat clinical concepts as static definitions. Instead, it dynamically tunes the meaning of a concept in one domain based on the visual features of the other. For example, macroscopic signs of invasion on an MRI actively condition how the model weights the discovery of "cellular atypia" in the corresponding H&E slide.

• 𝙄𝙣𝙩𝙚𝙧𝙥𝙧𝙚𝙩𝙖𝙗𝙡𝙚 𝙋𝙧𝙚𝙙𝙞𝙘𝙩𝙞𝙤𝙣𝙨: Because the mathematical fusion happens after aligning these clinical concepts, the final output isn't a mysterious score. The model provides a transparent rationale, highlighting exactly which pathology and radiology concepts drove the patient's risk profile or tumor grade.

• 𝙀𝙭𝙩𝙧𝙚𝙢𝙚 𝙀𝙛𝙛𝙞𝙘𝙞𝙚𝙣𝙘𝙮: By leveraging frozen domain-expert foundation models (BiomedCLIP and CONCH) and relying purely on lightweight prompt tuning, CTF achieves state-of-the-art performance across multiple survival and grading benchmarks. Impressively, it accomplishes this deep cross-modal synergy while requiring only 0.15% additional trainable parameters.

𝙏𝙝𝙚 𝙏𝙖𝙠𝙚𝙖𝙬𝙖𝙮: True multimodal AI is not just about combining data streams; it is about teaching disparate foundation models to communicate through a shared, dynamic clinical vocabulary.

_{Enjoy this newsletter? Here are more things you might find helpful:}

Pixel Clarity Call - A free 30-minute conversation to cut through the noise and see where your vision AI project really stands. We’ll pinpoint vulnerabilities, clarify your biggest challenges, and decide if an assessment or diagnostic could save you time, money, and credibility.

Book now

Did someone forward this email to you, and you want to sign up for more? Subscribe to future emails
This email was sent to _t.e.s.t_@example.com. Want to change to a different address? Update subscription
Want to get off this list? Unsubscribe
My postal address: Pixel Scientia Labs, LLC, PO Box 98412, Raleigh, NC 27624, United States