Exclusive: Anthropic is testing 'Mythos' its 'most powerful AI model ever developed'
Leaked docs show 'step change' in reasoning and coding, but fears of enabling sophisticated cyberattacks.
Anthropic is in the early stages of testing a new flagship AI model, internally referred to as 'Claude Mythos,' according to documents accidentally leaked from a publicly accessible data cache. The company confirmed the exposure, describing the materials as draft content not intended for public release. The leaked information positions Mythos as a 'step change' in performance, with significant advancements in core capabilities like reasoning, coding, and cybersecurity, surpassing the performance of its current top-tier Opus models.
The development, however, is shadowed by serious safety concerns explicitly noted in the documents. Anthropic's internal assessment warns that the model's advanced capabilities, particularly in cybersecurity, could be misused by malicious actors to enable sophisticated cyberattacks. This acknowledgment highlights the central tension in cutting-edge AI development: the race for greater capability versus the imperative for control and safety. In response, Anthropic is reportedly taking a highly cautious rollout strategy, limiting early access to a select group of organizations while it rigorously studies the model's potential impacts and risks.
- Leaked internal docs reveal Anthropic's 'Claude Mythos,' described as a 'step change' beyond current Opus models.
- The model shows major improvements in reasoning, coding, and cybersecurity capabilities.
- Anthropic's own assessment warns the tech could enable sophisticated cyberattacks, prompting a cautious, limited release.
Why It Matters
This leak exposes the high-stakes balance AI labs must strike between groundbreaking capability and preventing dangerous misuse.