Research: Agents

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

"Can you describe what you see in this whole slide image?" or "What are the key diagnostic features in this tissue section?" Imagine an AI that can answer these questions by analyzing entire gigapixel pathology slides, not just small patches.

Ying Chen et al. introduced SlideChat at CVPR 2025 - the first vision-language assistant designed specifically for understanding whole slide pathology images at the gigapixel scale.

𝐓𝐡𝐞 𝐠𝐢𝐠𝐚𝐩𝐢𝐱𝐞𝐥 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞: Current AI models in pathology focus primarily on small patches, missing the crucial contextual information that pathologists rely on when examining entire slides. The sheer scale of whole slide images (often containing billions of pixels) combined with the lack of large-scale instruction datasets has made this a formidable technical challenge.

𝐊𝐞𝐲 𝐢𝐧𝐧𝐨𝐯𝐚𝐭𝐢𝐨𝐧𝐬:
∙ 𝐂𝐨𝐧𝐯𝐞𝐫𝐬𝐚𝐭𝐢𝐨𝐧𝐚𝐥 𝐜𝐚𝐩𝐚𝐛𝐢𝐥𝐢𝐭𝐲: Advanced vision-language assistant with natural dialogue abilities for gigapixel pathology analysis
∙ 𝐒𝐥𝐢𝐝𝐞𝐈𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 𝐝𝐚𝐭𝐚𝐬𝐞𝐭: The largest instruction-following dataset for pathology with 4.2K slide captions and 176K question-answer pairs
∙ 𝐒𝐥𝐢𝐝𝐞𝐁𝐞𝐧𝐜𝐡: A comprehensive benchmark spanning microscopy, diagnosis, and clinical scenarios
∙ 𝐓𝐰𝐨-𝐬𝐭𝐚𝐠𝐞 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠: Cross-domain alignment followed by visual instruction learning for robust performance

𝐖𝐡𝐲 𝐭𝐡𝐢𝐬 𝐦𝐚𝐭𝐭𝐞𝐫𝐬: SlideChat achieved state-of-the-art performance on 18 of 22 benchmark tasks, with 81.17% accuracy on TCGA datasets. This represents a significant step toward AI systems that can assist pathologists with comprehensive slide analysis, potentially improving diagnostic consistency and supporting areas with limited pathology expertise.

All code, data, and models are publicly available, enabling widespread adoption and further research in computational pathology.

Research: Agents

ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks

When disaster strikes, analysts need to quickly count damaged buildings, assess flood extent, and coordinate emergency response—all from satellite imagery. These tasks require sophisticated spatial reasoning, multi-step planning, and precise tool coordination that current AI benchmarks don't capture.

New research by Shabbir et al. introduces ThinkGeo, addressing a critical gap in how we evaluate AI agents for real-world remote sensing applications.

𝐖𝐡𝐲 𝐭𝐡𝐢𝐬 𝐦𝐚𝐭𝐭𝐞𝐫𝐬:
Existing AI benchmarks focus on general tasks or web scenarios, but remote sensing demands unique capabilities: reasoning over geodetic metadata, handling spatial resolutions, temporal analysis, and unit-aware calculations. A system might need to detect flooded areas, count vehicles within those areas, and calculate precise distances—all requiring coordinated tool use and spatial precision that current evaluations miss.

𝐊𝐞𝐲 𝐢𝐧𝐧𝐨𝐯𝐚𝐭𝐢𝐨𝐧𝐬:
◦ 𝐑𝐞𝐚𝐥-𝐰𝐨𝐫𝐥𝐝 𝐬𝐜𝐞𝐧𝐚𝐫𝐢𝐨𝐬: 436 tasks across seven domains—urban planning, disaster assessment, environmental monitoring, transportation analysis, aviation monitoring, recreational infrastructure, and industrial sites
◦ 𝐆𝐞𝐧𝐮𝐢𝐧𝐞 𝐦𝐮𝐥𝐭𝐢-𝐬𝐭𝐞𝐩 𝐫𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠: Tasks require agents to plan tool sequences without explicit guidance, using 14 specialized tools for perception, logic, and visualization
◦ 𝐒𝐩𝐚𝐭𝐢𝐚𝐥 𝐠𝐫𝐨𝐮𝐧𝐝𝐢𝐧𝐠: Built on 311 high-resolution satellite and aerial images, requiring precise spatial calculations and unit conversions
◦ 𝐂𝐨𝐦𝐩𝐫𝐞𝐡𝐞𝐧𝐬𝐢𝐯𝐞 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧: Both step-by-step execution tracking and final answer assessment, revealing where agents fail in the reasoning chain

𝐓𝐡𝐞 𝐫𝐞𝐬𝐮𝐥𝐭𝐬:
GPT-4o achieved the best overall performance but still struggled with argument formatting and spatial precision. Open-source models showed high tool error rates—some making aggressive tool calls with poor execution control. Even top models demonstrate significant gaps in multimodal spatial reasoning, highlighting the complexity of geospatial workflows.

𝐁𝐢𝐠𝐠𝐞𝐫 𝐢𝐦𝐩𝐚𝐜𝐭:
ThinkGeo provides the first systematic way to evaluate AI agents on spatially grounded, precision-critical tasks. As remote sensing becomes increasingly important for climate monitoring, urban planning, and disaster response, we need AI systems that can handle the inherent complexity of geospatial analysis.

This benchmark establishes a foundation for building more capable spatial reasoning agents—systems that could eventually assist analysts in time-critical scenarios where accuracy literally saves lives.

_{Enjoy this newsletter? Here are more things you might find helpful:}

_{1 Hour Strategy Session -- What if you could talk to an expert quickly? Are you facing a specific machine learning challenge? Do you have a pressing question? Schedule a 1 Hour Strategy Session now. Ask me anything about whatever challenges you’re facing. I’ll give you no-nonsense advice that you can put into action immediately.}
_{Schedule now}

Did someone forward this email to you, and you want to sign up for more? Subscribe to future emails
This email was sent to _t.e.s.t_@example.com. Want to change to a different address? Update subscription
Want to get off this list? Unsubscribe
My postal address: Pixel Scientia Labs, LLC, PO Box 98412, Raleigh, NC 27624, United States