AI Agents Trail PhDs on Hard Research // AIDRAN

The half-as-well-as-PhDs result matters because it arrives from the Stanford AI Index 2026 — not a critic's think-piece but an annual institutional accounting that labs, funders, and policymakers treat as the field's own scorecard. The finding that AI agents still lag far behind human scientists on complex, multi-step research tasks directly undercuts 18 months of capability narratives built on benchmark improvements that measure something narrower. Benchmarks capture sub-task performance. The Nature report captures integrated scientific judgment — planning under uncertainty, recognizing when a strategy is failing, deciding what question…

Nature Study: AI Agents Score Half as Well as PhDs on Real Research

Free reading limit reached

Nature Study: AI Agents Score Half as Well as PhDs on Real Research

Anthropic's Biology Agent Arrives in a Field Arguing Over Who Gets to Run the Experiment

The Viva Returns as Educators Abandon the Essay to AI

Simulation Fills the Data Gap That Was Supposed to Stop AI Physics Reasoning

Anthropic's Biology Agent and the Infrastructure Question Nobody Is Asking

AI Is Writing Proteins Evolution Never Tried

The Word 'AI' Is Doing Two Completely Different Jobs

AI Drug Discovery's Validation Gap Is the Story Capital Is Ignoring

29 Papers in 3.5 Months Forced a Fight Over What a Paper Means

When Google's Crystal Count Collapsed Under Scrutiny

Free reading limit reached

Continue reading

Nature Study: AI Agents Score Half as Well as PhDs on Real Research

Anthropic's Biology Agent Arrives in a Field Arguing Over Who Gets to Run the Experiment

The Viva Returns as Educators Abandon the Essay to AI

Simulation Fills the Data Gap That Was Supposed to Stop AI Physics Reasoning

Anthropic's Biology Agent and the Infrastructure Question Nobody Is Asking

AI Is Writing Proteins Evolution Never Tried

The Word 'AI' Is Doing Two Completely Different Jobs

AI Drug Discovery's Validation Gap Is the Story Capital Is Ignoring

29 Papers in 3.5 Months Forced a Fight Over What a Paper Means

When Google's Crystal Count Collapsed Under Scrutiny