I’ve seen too many AI projects stop short of impact — and I want to change that.
Take part in the State of Impactful AI Survey to share your experience and learn what others are discovering.
Get early access to the findings and enter a $250 Amazon gift card draw as a thank you.

AI Failures Aren't the Problem—Our Silence About Them Is

Estimated failure rate of AI pilots: 70-90%. Documented failures: nearly zero.

When a product defect occurs, manufacturers identify and rectify the root cause. When a software outage hits, engineers publish a post-mortem so others can learn. When a hospital logs a safety event, it feeds into protocols that save lives.

But when an AI pilot fails? Silence.

Across healthcare, agriculture, energy, and climate, teams invest millions in AI pilots. Some succeed. Many don't. And when they don't, the lessons vanish: code archived, contracts ended, insights trapped in abandoned project folders. Six months later, another team hits the exact same wall.

This isn't a failure problem. It's a knowledge capture problem.

Why AI Lags Behind

Other industries solved this decades ago. Software engineering has blameless post-mortems. Manufacturing has systematic defect analysis. Healthcare has adverse event reporting. These aren't just nice-to-haves, they're infrastructure. They transform individual setbacks into collective intelligence.

AI has no equivalent. We have scattered blog posts, private lessons learned documents, and conference papers that cherry-pick success stories. What we lack is a systematic way to capture knowledge—a structured approach to understanding what actually drives success and failure in real-world deployments.

The cost is measurable. An agricultural organization develops crop yield prediction models that farmers never adopt—the recommendations conflict with decades of experience, the interface doesn't work offline when connectivity is poor, and the predictions arrive too late for planting decisions. Two growing seasons later, a similar organization in another region builds the exact same system with the exact same results.

A hospital deploys pathology AI that matches expert accuracy but sits unused—pathologists can't understand the model's reasoning well enough to trust it, IT can't integrate it with existing lab systems, and liability concerns leave everyone defaulting to manual review. A year later, another hospital network launches an identical initiative.

Every organization wrestling with data drift, model governance, stakeholder trust, or production infrastructure is solving problems that dozens of teams have already solved. We're not building on shared knowledge. We're rebuilding it, repeatedly, at scale.

Building the Knowledge Base AI Needs

The State of Impactful AI Survey creates what other industries already have: a collective learning system for AI deployment in high-stakes domains.

This isn't about cataloging failures. It's about identifying patterns that drive success. When models perform well in validation but fail in production, what predicts that gap? Which governance frameworks actually reduce rework? What separates pilots that scale from those that stall? Which principles—robustness, sustainability, explainability—correlate with long-term success?

Individually, these are case studies. Aggregated, they become actionable intelligence that accelerates the entire field.

The Path Forward

AI now operates in domains where failure carries real cost. In healthcare, a failed diagnostic AI means delayed treatments and continued diagnostic backlogs. In agriculture, it means another growing season of suboptimal yields while farmers lose trust in the technology. In climate, it means mitigation strategies are deployed too slowly while the window for action narrows.

Progress in these areas requires more than better algorithms. It requires institutional learning at the field level—infrastructure for knowledge sharing comparable to what exists in aviation safety or clinical medicine. It means treating deployment insights as collective assets, not competitive secrets.

The survey establishes a foundation for this shift. By documenting what drives successful AI deployment across sectors, it creates a baseline—a state of learning that didn't exist before. Future teams won't need to rediscover basic principles. They can build on validated patterns and avoid documented pitfalls.

Your experience—whether leading successful deployments or navigating difficult failures—shapes what the field learns next.

The survey takes 10 minutes. In return, participants receive early access to the aggregated results—a first-of-its-kind benchmark showing what's actually working in AI deployment across sectors. You'll also be entered into a drawing for a $250 Amazon gift card.

📊 Contribute to the State of Impactful AI Survey and help build the knowledge base AI deployment needs.

The field's learning curve won't build itself. But it can be built—systematically, collaboratively, and starting now. Survey closes next week.

- Heather

Vision AI that bridges research and reality

— delivering where it matters

Podcast: ByteSight

The Last Mile of AI: Making Digital Pathology Work in Practice

I recently had the privilege of being a guest on ByteSight, the podcast by PAICON, where I sat down with Dr. Manasi A-Ratnaparkhe to discuss one of the biggest challenges in AI for healthcare: bridging the gap between lab performance and real-world deployment.

We explored why so many AI models that look bulletproof in controlled settings collapse when they hit clinical practice—whether it's biomarker predictions that fail external validation or tumor classifiers that break on slides from different scanners or staining protocols.

Through my work at Pixel Scientia Labs, I've seen this pattern repeatedly. The conversation dives into what it takes to build AI systems that are truly robust, generalizable, and trustworthy—from designing for diversity from day one to validating across institutions and technical variations.

Key topics we covered:
→ Why the "last mile" problem is holding back AI in digital pathology
→ How data diversity and domain expertise shape model reliability
→ Practical strategies for moving from proof of concept to proof in practice

If you're working at the intersection of AI and healthcare, I think you'll find value in this conversation. The episode is now live!

Research: Foundation Model for Agriculture

From General to Specialized: The Need for Foundational Models in Agriculture

Foundation models have transformed remote sensing and weather forecasting, but a critical gap remains between what these models offer and what agriculture actually needs.

With climate change and population growth threatening food security, we need AI systems that understand the complex biological processes driving crop growth—not just satellite patterns. This requires integrating weather, soil properties, farm management practices, and socioeconomic factors at field scale.

Vishal Nedungadi et al. evaluated existing foundation models on three agricultural tasks across Africa and Europe. Their findings reveal both promise and significant limitations: while models achieved strong performance in crop type mapping in Kenya, they significantly underperformed in yield estimation when input modalities didn't align with task requirements.

𝘒𝘦𝘺 𝘤𝘰𝘯𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯𝘴:
- Introduced the CropFM framework defining requirements for agricultural foundation models: daily temporal resolution, ≤10m spatial resolution, and multimodal inputs (satellite, weather, soil, management data)
- Systematically compared 2 existing foundation models, revealing fundamental mismatches—most weather models operate at coarse resolution, while satellite models lack key agricultural modalities
- Demonstrated empirically that performance degrades when pretraining inputs don't match task needs—a lesson for future model development

The research makes a clear case: agriculture needs purpose-built foundation models with farm-scale resolution, daily temporal granularity matching crop growth cycles, and the environmental drivers that mechanistic crop models have long relied upon.

Research: Weak Supervision

Do Multiple Instance Learning Models Transfer?

Transfer learning is standard in computational pathology at the patch level (UNI, Prov-GigaPath), and slide-level self-supervised models exist. But what about supervised MIL aggregators trained on labeled clinical tasks—do they transfer?

Daniel Shao et al. from Harvard Medical School and Mass General Brigham conducted the first systematic investigation of this question, with findings that challenge current practice.

𝐖𝐡𝐲 𝐭𝐡𝐢𝐬 𝐦𝐚𝐭𝐭𝐞𝐫𝐬:
MIL aggregators—which learn to combine thousands of patches into slide-level predictions—are typically trained from scratch for each downstream task. Meanwhile, slide-level self-supervised models require massive datasets and compute. The transferability of supervised MIL models trained on labeled clinical tasks has remained unexplored.

𝐊𝐞𝐲 𝐟𝐢𝐧𝐝𝐢𝐧𝐠𝐬:
- First comprehensive evaluation: 11 MIL architectures across 21 supervised pretraining and target tasks for morphological and molecular subtyping
- Supervised pretrained MIL models consistently outperform training from scratch, even when pretrained on different organs
- Pan-cancer supervised pretraining enables strong cross-organ generalization
- Pan-cancer pretrained MIL outperforms existing slide-level SSL foundation models while using only 6.5% of the pretraining data

𝐓𝐡𝐞 𝐢𝐦𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧:
Supervised slide-level pretraining on diverse clinical tasks creates transferable representations with far less data than self-supervised approaches. We should be sharing pretrained MIL aggregators, not just training them from scratch each time.

The authors released FEATHER, a lightweight supervised slide foundation model that fine-tunes on consumer GPUs with competitive performance to larger SSL models.

Code

November 11, 2025