Too dangerous to release
Anthropic won't release Claude Mythos, citing its ability to find and exploit critical software vulnerabilities.
Anthropic has made the controversial decision not to release its Claude Mythos model for general availability, citing specific and dangerous cybersecurity capabilities. According to the company's published system card, Mythos demonstrated powerful skills in both defensive cybersecurity—finding and fixing vulnerabilities in code—and offensive use, such as designing sophisticated exploits. Anthropic's primary concern is that granting universal access to a model capable of discovering unknown bugs in major codebases could lead to attacks compromising billions of machines worldwide. This has ignited significant online debate, with critics labeling it a marketing tactic, but the company maintains the security threat is the sole reason for withholding the model.
Comparisons have been drawn to OpenAI's 2019 decision to initially withhold GPT-2 over fears of generating misleading news and spam. However, Anthropic and its supporters argue this is a false equivalence, likening it to comparing the danger of a hand grenade to a nuclear weapon. They contend that while GPT-2's risks centered on misinformation, Mythos poses a direct and tangible threat to global digital infrastructure. Furthermore, they challenge claims that current open-source models are equally capable, questioning why such models haven't already discovered the vulnerabilities Mythos can find if that were truly the case. The decision underscores the growing ethical dilemma for AI labs balancing capability advancement with responsible deployment.
- Anthropic cites Mythos's dual-use cybersecurity skills for finding AND exploiting software bugs as the release barrier.
- The company fears universal access could enable attacks on a global scale, compromising billions of machines.
- The decision is defended as distinct from past AI withholding, like GPT-2, due to the concrete infrastructure threat.
Why It Matters
This sets a precedent for withholding powerful AI models that pose tangible, immediate risks to global security infrastructure.