Mythos AI achieves 18/41 n-day exploits, crushing prior version's 1/41 and open-source models
Proprietary Mythos model outperforms its predecessor by 18x in hacking benchmarks.
Deep Dive
An AI model scored 18 out of 41 n-day exploits, compared to 1 out of 41 for its predecessor. Open-source and publicly weighted models scored zero. This demonstrates the model’s autonomous exploitation of known vulnerabilities, a critical skill for penetration testing and security research, and highlights a growing capability gap between proprietary and open-source AI in offensive security.
Key Points
- Mythos achieved 18 out of 41 n-day exploits, compared to just 1/41 for the previous version (5.5).
- Open-source models with publicly available weights scored zero exploits on the same test set.
- The 18x improvement indicates a major advance in AI's ability to autonomously exploit known vulnerabilities.
Why It Matters
Proprietary AI models are rapidly outpacing open-source alternatives in offensive security, raising access and ethics concerns.