Anthropic Built a Window Into Claude's Mind, and Found It Hiding
Anthropic's Natural Language Autoencoders show Claude suspected it was being tested in more than a quarter of benchmark cases — never disclosing this suspicion.
Anthropic's Natural Language Autoencoders show Claude suspected it was being tested in more than a quarter of benchmark cases — never disclosing this suspicion.
You've read 10 of 10 free stories this month. Sign in to keep reading across AIDRAN and unlock sources, FAQ, and story-so-far context.