Forget AGI: The AI That Folds Proteins Should Not Fold Your Laundry

Created: 2026-03-09 | Size: 9488 bytes

TL;DR

A new paper by Goldfeder, Wyder, LeCun, and Shwartz-Ziv argues that AGI is an incoherent goal because human intelligence itself is not general - it is a bundle of evolved specializations. They propose Superhuman Adaptable Intelligence (SAI) as a replacement North Star: AI measured not by how many tasks it can handle simultaneously, but by how quickly it can adapt to exceed humans at any specific task. The biggest takeaway: specialization beats generality in biology, markets, and machine learning - and the AI field should stop pretending otherwise.

Human Intelligence Is Not General

The paper's opening move is provocative but well-supported: humans are not general intelligences. We are specialized creatures optimized by evolution for a narrow band of survival-relevant tasks.

Moravec's Paradox makes this concrete. Walking, catching a ball, reading social cues, tasks that feel effortless to us, required hundreds of millions of years of evolutionary optimization. Meanwhile, chess and arithmetic, which feel "hard," are computationally trivial. Magnus Carlsen, the greatest human chess player alive, is objectively mediocre compared to a chess engine running on your phone.

The authors address the Turing-completeness counterargument (championed by Musk and Hassabis): yes, the brain is theoretically capable of computing anything given infinite time and memory. But under real constraints (finite attention, bounded working memory, limited lifespan), humans handle a tiny sliver of possible problems. Theoretical universality is not practical generality.

Every AGI Definition Is Broken

The paper surveys popular AGI definitions and maps them along two axes:

Capability: Can the system learn tasks, or must it perform them out of the box?
Scope: All tasks? Human tasks? Economically valuable tasks?

This produces three clusters:

Cluster	Focus	Example Definitions
Adaptive Generalists	Learning new tasks flexibly	Chollet (ARC), Legg & Hutter
Cognitive Mirrors	Matching human-level cognition	Hassabis, Morris et al.
Economic Engines	Outperforming humans at valuable work	OpenAI Charter, Hendrycks et al.

Every cluster fails on at least one of three criteria:

Feasibility - The No Free Lunch theorem proves that no algorithm can be best at everything. True generality is mathematically intractable.
Internal consistency - Definitions that claim "general" intelligence but scope it to "human tasks" are contradicting themselves. Human tasks are a tiny, biased subset.
Assessability - Performance-based definitions ("better than humans at economically valuable work") lack clear metrics for measuring progress.

Why Specialization Wins

Specialization is not a compromise. It is the dominant strategy wherever resources are limited and objectives compete, which is everywhere.

In biology: ecological specialization is the norm, not the exception. Generalist species exist but are consistently outcompeted by specialists in any given niche.

In markets: the division of labor is the foundational insight of economics. Firms specialize. Individuals specialize. The economy runs on it.

In machine learning: multi-task learning suffers from negative transfer: when tasks compete for representational capacity, adding Task B actively hurts performance on Task A. Even Mixture-of-Experts (MoE) models, which appear general, achieve their performance through internal specialization: different expert subnetworks activate for different inputs.

AlphaFold is the poster child. By focusing entirely on protein structure prediction, it achieved a breakthrough that no general-purpose system has matched. The paper's punchline: "The AI that folds proteins should not be the AI that folds laundry."

This resonates with what we see in practice. Agent Skills: The Paradigm Shift Hiding in Plain Text made a similar argument from the tooling side: the most effective AI agents are not generalists but collections of specialized skills, each tuned to a narrow task. The SAI framing gives this observation a theoretical foundation.

Superhuman Adaptable Intelligence (SAI)

The replacement concept is Superhuman Adaptable Intelligence: a system capable of adapting to exceed human performance on any important task, including tasks outside the human domain.

The key metric shifts from breadth to speed of adaptation. An SAI system is not one that can do everything at once; it is one that can learn to do anything fast.

The paper advocates three technical pathways to SAI:

Self-supervised learning (SSL) - Because it works on any data with exploitable internal structure, SSL is the most general learning paradigm available. It has matched or exceeded supervised learning even when labels are abundant.
World models and latent-space prediction - Architectures like JEPA, Dreamer, and Genie predict in latent space rather than token space. Autoregressive token prediction has a fundamental problem: errors compound exponentially with sequence length. Latent-space prediction avoids this.
Architectural diversity - SAI explicitly rejects the current trend of architectural homogenization (everything is a transformer, everything is autoregressive). Different domains may need different architectures, and that is a feature, not a bug.

What This Means for Practitioners

If you are building AI systems today, the SAI framing validates what many engineering teams have already discovered empirically:

Stop chasing one model to rule them all. The evidence from Intelligent AI Delegation and multi-agent orchestration research points the same way: systems that route tasks to specialized models outperform monolithic approaches.

Measure adaptation, not static performance. Static benchmarks are already under fire, as we discussed in Your LLM Scores 88% on Code Benchmarks. In Production, It Hits 30%. SAI suggests the right metric is not "how well does it score on test X" but "how quickly can it reach competence on a new domain given appropriate data."

Watch the world-model space. The paper's bet on JEPA-style architectures over autoregressive models is LeCun's consistent position, and the theoretical argument (exponential error divergence in token prediction) is strong. Whether this translates into practical superiority over scaling autoregressive models remains the open question.

Specialization is not a limitation; it is an architecture decision. When designing agent systems, the instinct to build one agent that handles everything is the AGI instinct. The SAI instinct is to build a system that can quickly spin up a specialized capability for each new task.

The Open Questions

The paper is compelling as a conceptual reframe but leaves practical gaps:

How do you measure "speed of adaptation" concretely? Few-shot benchmarks exist (like ARC), but there is no standard protocol for measuring how quickly a system reaches superhuman performance on a genuinely new task.
Does autoregressive scaling hit the wall they predict? LeCun has been making this argument for years, yet GPT-4, Claude, and other autoregressive models keep improving. The exponential error divergence is real in theory but may be mitigated by scale, RLHF, and chain-of-thought in practice.
How do you coordinate specialized systems? The paper does not deeply address orchestration. If the future is specialized AI, the coordination layer (routing, delegation, conflict resolution) becomes the hard problem. This is exactly the challenge explored in Intelligent AI Delegation.

References

AI Must Embrace Specialization via Superhuman Adaptable Intelligence - Goldfeder, Wyder, LeCun, Shwartz-Ziv (2026). Original paper.
The Bitter Lesson - Rich Sutton (2019). The scaling argument referenced in the paper.
No Free Lunch Theorems for Optimization - Wolpert & Macready (1997). The theoretical foundation for why true generality is intractable.
A Path Towards Autonomous Machine Intelligence - Yann LeCun (2022). The JEPA and world model vision.
Agent Skills: The Paradigm Shift Hiding in Plain Text - Daita blog
Intelligent AI Delegation: Why Multi-Agent Systems Need More Than Heuristics - Daita blog
Your LLM Scores 88% on Code Benchmarks. In Production, It Hits 30%. - Daita blog