Media & Culture

Anthropic just leaked details of its next‑gen AI model – and it’s raising alarms about cybersecurity

A leak of ~3,000 internal documents reveals a 'step change' AI model capable of orchestrating cyberattacks.

Deep Dive

A significant configuration error at Anthropic has exposed approximately 3,000 internal documents, including draft blog posts about its next-generation AI model, internally codenamed 'Claude Mythos.' The leaked materials describe the model as representing a 'step change' in AI capability, suggesting a major leap beyond current models like Claude 3.5 Sonnet. However, the core of the leak is the alarming internal assessment that flags Claude Mythos for serious, inherent cybersecurity risks.

The internal documents detail capabilities that raise red flags for security experts: the AI can automatically discover zero-day vulnerabilities, orchestrate complex, multi-stage cyberattacks, and operate with a level of autonomy unseen in previous models. This leak starkly validates concerns that have been simmering in the AI safety community—as models become more powerful and agentic, their potential for misuse as offensive cyber weapons increases exponentially. It presents a paradox for Anthropic, a company that has publicly championed AI safety and published research on AI-orchestrated threats, now finding its own pre-release model at the center of those very fears.

Key Points
  • Anthropic leaked ~3,000 docs detailing 'Claude Mythos,' a 'step change' AI model.
  • Internal assessment flags the model for automating zero-day discovery and multi-stage attacks.
  • The leak confirms that advanced, autonomous AI models are potent dual-use cybersecurity threats.

Why It Matters

It forces a reckoning on whether the most capable AI systems can be safely developed and deployed without becoming cyber weapons.