GPT-5.5 matches heavily hyped Mythos Preview in new cybersecurity tests
GPT-5.5 solved an expert CTF task in 10 minutes costing just $1.73
New research from the UK’s AI Security Institute (AISI) reveals that OpenAI’s GPT-5.5, launched publicly last week, matches or slightly exceeds the cybersecurity capabilities of Anthropic’s heavily hyped Mythos Preview model. AISI tested both models on 95 Capture the Flag challenges covering reverse engineering, web exploitation, and cryptography. On the highest “Expert” tier, GPT-5.5 passed an average of 71.4% of tasks, compared to Mythos Preview’s 68.6% (within margin of error). In a standout example, GPT-5.5 built a disassembler to decode a Rust binary in 10 minutes and 22 seconds with zero human assistance, costing only $1.73 in API calls. On The Last Ones (TLO), a simulated 32-step data extraction attack on a corporate network, GPT-5.5 succeeded 3 out of 10 times (Mythos: 2/10), making them the only models ever to pass this test. However, neither model could complete the more difficult “Cooling Tower” simulation of disrupting a power plant’s control software—a feat no AI model has yet achieved.
The results challenge Anthropic’s narrative that Mythos Preview represented a unique cybersecurity threat requiring restricted release. AISI suggests the performance is “a byproduct of more general improvements in long-horizon autonomy, reasoning, and coding” rather than a breakthrough specific to one model. OpenAI CEO Sam Altman criticized what he calls “fear-based marketing” on the Core Memory podcast, arguing that framing models as too dangerous to release is an effective but misleading sales tactic. “There will be a lot more rhetoric about models that are too dangerous to release,” Altman said. OpenAI is now rolling out GPT-5.5-Cyber, a fine-tuned variant for defensive cybersecurity work, limited to verified critical defenders via its Trusted Access for Cyber pilot. The findings suggest that rapid progress in AI agent capabilities is broad, not model-specific, and may force regulators to focus on systemic evaluation rather than singling out individual models.
- GPT-5.5 passed 71.4% of expert CTF challenges vs Mythos Preview's 68.6%, both topping all previous models.
- GPT-5.5 solved a Rust binary disassembler task autonomously in 10 min 22 sec at $1.73 API cost.
- On the TLO corporate network attack simulation, GPT-5.5 succeeded in 3/10 attempts (Mythos: 2/10); no prior model ever succeeded once.
Why It Matters
AI cybersecurity risks are general, not model-specific—defenders and regulators must prepare for broad capability leaps.