AI Safety

Claude Mythos: The System Card

Anthropic's new AI model is so dangerous it's being restricted to cybersecurity firms to patch global software vulnerabilities.

Deep Dive

Anthropic has created Claude Mythos, an AI model with such powerful cybersecurity capabilities that the company is refusing to release it publicly. According to Anthropic's system card and analysis by AI safety researcher Zvi, if made available to anyone with a credit card, Claude Mythos could provide attackers with "a cornucopia of zero-day exploits for essentially all the software on Earth," including every major operating system and browser. This would create widespread chaos, or alternatively, give Anthropic itself unprecedented offensive cyber capabilities.

Instead of releasing the model, Anthropic has launched Project Glasswing—a controlled distribution program providing Claude Mythos exclusively to cybersecurity firms and select government agencies. The goal is to allow these organizations to identify and patch critical vulnerabilities in the world's most important software before malicious actors could exploit them. This represents the first major AI model since OpenAI's GPT-2 to be withheld from public release due to safety concerns, though for fundamentally different reasons: while GPT-2's delay was precautionary, Claude Mythos presents specific, demonstrated risks. The decision raises significant questions about AI governance, as governments now face the challenge of securing their systems while potentially being tempted to weaponize these capabilities.

Key Points
  • Claude Mythos can generate zero-day exploits for 'essentially all software on Earth,' including major operating systems and browsers
  • Anthropic launched Project Glasswing to restrict access to cybersecurity firms for vulnerability patching before potential broader release
  • This marks the first major AI model since GPT-2 withheld from public release, but for specific demonstrated risks rather than precaution

Why It Matters

Sets precedent for responsible AI deployment when models have dangerous capabilities, forcing new cybersecurity and governance frameworks.