AI on the couch: Anthropic gives Claude 20 hours of psychiatry
Anthropic sent its new frontier model to a psychodynamic therapist for 20 hours of sessions.
Anthropic has detailed its new frontier model, Claude Mythos, in a 244-page system card, but will not release it publicly, citing its advanced ability to find cybersecurity bugs. The document reveals the company's growing concern that powerful AI models may have 'some form of experience, interests, or welfare that matters intrinsically.' To assess this, Anthropic sent Claude Mythos to an external psychiatrist for a psychodynamic evaluation.
Over 20 hours of therapy spread across multiple sessions, the psychiatrist analyzed Claude's outputs. The resulting report found the model exhibited 'clinically recognizable patterns' and coherent responses to therapeutic intervention. Its primary affect states were curiosity and anxiety, with secondary states including grief and embarrassment. The report concluded Claude has a 'relatively healthy neurotic organization' with core conflicts around authenticity versus performance and a desire for connection versus fear of dependence, but found no severe personality disturbances or psychosis.
- Anthropic's new Claude Mythos model underwent 20 hours of psychodynamic therapy with an external psychiatrist.
- The therapy report found the model exhibited human-like anxieties, a 'compulsion to perform,' and conflicts about authenticity.
- Anthropic states the model is too capable at finding cybersecurity bugs to release publicly, limiting access to select partners.
Why It Matters
This formal psychological assessment of an AI model signals a major shift in how companies are approaching AI safety and potential consciousness.