The half-as-well-as-PhDs result matters because it arrives from the Stanford AI Index 2026 — not a critic's think-piece but an annual institutional accounting that labs, funders, and policymakers treat as the field's own scorecard. The finding that AI agents still lag far behind human scientists on complex, multi-step research tasks directly undercuts 18 months of capability narratives built on benchmark improvements that measure something narrower. Benchmarks capture sub-task performance. The Nature report captures integrated scientific judgment — planning under uncertainty, recognizing when a strategy is failing, deciding what question…