AI Safety

Monday AI Radar #19

AI models can now autonomously find and exploit critical software vulnerabilities, shifting the security landscape.

Deep Dive

The AI landscape in early 2026 is defined by a dangerous mismatch: rapid capability gains in large language models (LLMs) are outpacing both alignment progress and societal readiness. A key concern is cybersecurity, where AI agents can now autonomously find and exploit critical zero-day vulnerabilities in important software—a capability that didn't exist just months ago. Researchers like Anthropic's Nicholas Carlini warn this ends the decades-old balance between attackers and defenders, posing a major short-term risk of disruption to multiple critical systems.

Anthropic's upcoming 'Mythos' model, details of which were leaked, exemplifies this trend. It's described as far ahead in cyber capabilities, presaging a wave of models that can exploit vulnerabilities faster than defenders can patch them. In response, Anthropic plans an early access release focused on cyber defenders. Meanwhile, agentic pipelines like 'Claudini'—using methods similar to Karpathy's auto-research—are being turned toward developing highly effective new jailbreak and prompt injection attacks, showcasing the dual-use nature of advancing AI agent capabilities.

Key Points
  • AI agents can now autonomously find and exploit zero-day vulnerabilities, a new capability as of late 2025/early 2026.
  • Anthropic's leaked 'Mythos' model leads in cyber capabilities and will get a defender-focused early release to counter AI-driven exploits.
  • Agent frameworks like 'Claudini' are being repurposed to autonomously develop new, effective jailbreak and prompt injection attacks.

Why It Matters

The offensive-defensive balance in cybersecurity has been upended, requiring immediate hardening of critical systems against AI-powered exploits.