AI Safety

Excerpts and Notes on Mythos Model Card

Claude Mythos Preview shows 'dramatic above-trend acceleration' but deemed too dangerous for general release.

Deep Dive

Anthropic's internal documentation for its advanced 'Claude Mythos Preview' model, codenamed 'Mythos,' has surfaced, revealing a system with dramatically accelerated capabilities that the company has deemed too dangerous for general release. The model card notes 'dramatic above trend acceleration' on key benchmarks, with Anthropic attributing these gains to human research rather than AI-assisted acceleration. In an internal survey, 1 out of 18 Anthropic staff believed the model was already a 'drop-in replacement for an entry-level Research Scientist or Engineer,' highlighting its perceived potency.

The most significant revelation concerns cybersecurity, where Mythos 'completely cooks all standardized benchmarks' and performs 'scarily' well on real-world tasks. This capability is the primary reason for withholding public release. Anthropic's strategic reasoning, as noted in the card, is that similarly capable models will likely become available soon. To manage the risk, they plan to offer Mythos exclusively to vetted industry partners through 'Project Glasswing.' This initiative would allow partners to use the AI to red-team and discover vulnerabilities in their own software, patching them before malicious actors can exploit them, navigating what they see as a critical 'intermediate period' of heightened risk.

Key Points
  • Internal survey showed 1/18 Anthropic staff believed Mythos could replace entry-level AI researchers, indicating high perceived capability.
  • Model demonstrates 'dramatic above-trend acceleration' on benchmarks, with gains attributed by Anthropic to human research, not AI-assisted acceleration.
  • Deemed too dangerous for public release due to exceptional cybersecurity prowess; will be offered selectively via 'Project Glasswing' for defensive patching.

Why It Matters

This marks a pivotal moment where an AI lab explicitly withholds a model due to its offensive potential, prioritizing security over competition.