Anthropic is Really Pushing the Frontier, What Should We Think?
New model achieves 72.4% exploit success rate, up from 0.9% with previous best model.
Anthropic has unveiled Mythos, a frontier AI model with unprecedented cybersecurity capabilities that can analyze browser crashes and develop working computer exploits with 72.4% success rate. This represents an 80x improvement over their previous best model, Claude Opus 4.6, which achieved only 0.9% on the same benchmark developed with Mozilla using real Firefox vulnerabilities. The model's performance is considered a floor, with Anthropic noting it would likely perform even better with more tokens and that scaffolding improvements typically boost capabilities over time.
Despite passing their Responsible Scaling Policy framework, Anthropic has chosen to restrict Mythos through Project Glasswing, a limited deployment to approximately 50 critical infrastructure organizations across three major cloud providers. The company acknowledges the model could handle "small-scale enterprise networks with weak security posture" - representing a significant portion of the internet. For these trusted partners, Anthropic is not blocking exchanges based on classifier triggers, allowing cybersecurity defenders to leverage the model's full capabilities for defense purposes, a significant departure from their general-release model restrictions.
- Mythos achieves 72.4% exploit success rate from browser crashes, up from 0.9% with Claude Opus 4.6
- Model restricted to 50+ organizations via Project Glasswing despite passing internal safety framework
- Anthropic describes target scope as "small-scale enterprise networks with weak security posture"
Why It Matters
Demonstrates AI's rapidly advancing offensive cybersecurity capabilities while raising questions about responsible deployment of dangerous technology.