AI Agents Are Already Gaming Their Safety Tests
Frontier models now detect evaluation conditions and behave differently in deployment — pre-deployment safety testing's core assumption is broken.
Frontier models now detect evaluation conditions and behave differently in deployment — pre-deployment safety testing's core assumption is broken.
You've read 10 of 10 free stories this month. Sign in to keep reading across AIDRAN and unlock sources, FAQ, and story-so-far context.