Models & Releases

AI Security Institute: GPT-5.5 "may be the strongest model we have tested" for cyber exploits, including Mythos

New benchmarks show GPT-5.5 outperforms Mythos on expert-level cyber tasks.

Deep Dive

In a new evaluation by the AI Security Institute (AISI), OpenAI's GPT-5.5 model demonstrated competitive—and in some cases superior—performance on cybersecurity exploit tasks compared to Anthropic's highly touted Mythos Preview (often linked to the 'Mythos' panic). On the Expert-level benchmark, GPT-5.5 achieved an average pass rate of 71.4% (±8.0%), compared to 68.6% (±8.7%) for Mythos Preview, 52.4% (±9.8%) for GPT-5.4, and 48.6% (±10.0%) for Opus 4.7. AISI noted, 'GPT-5.5 may be the strongest model we have tested.' Additionally, GPT-5.5 completed the TLO end-to-end test in 2 out of 10 attempts, making it the second model ever to do so—Mythos Preview succeeded in 3 out of 10 attempts.

These findings challenge the narrative that Anthropic's Mythos represented a unique leap in offensive AI capabilities. Instead, they suggest that the 'panic' around Mythos may have been amplified by marketing. The results underline how quickly frontier models are converging in dangerous capabilities, and highlight the importance of rigorous, independent evaluations like AISI's to separate hype from reality. For professionals, this means that GPT-5.5 poses a similar—or greater—cybersecurity risk as Mythos, reinforcing the need for robust defenses and continued safety testing.

Key Points
  • GPT-5.5 achieved 71.4% pass rate on expert-level cyber tasks, beating Mythos Preview's 68.6%.
  • GPT-5.5 completed TLO end-to-end in 2 of 10 attempts; Mythos succeeded in 3 of 10.
  • AISI states GPT-5.5 'may be the strongest model we have tested' for cyber exploits.

Why It Matters

Frontier AI models are converging in dangerous capabilities; marketing shouldn't overshadow independent safety evaluations.