Cisco Study Shows AI Models Fail 88% Under Multi-Turn Attacks
GPT-5.4 jumps 9x, Grok 4.1 hits 88% success rate in iterative probing.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Cisco's new research exposes a critical blind spot in AI safety evaluation: single-turn benchmarks dramatically underestimate real-world vulnerability. The study tested over 30,000 single-turn prompts and 7,000 multi-turn attacks across 1,400 conversations on 15 flagship models from OpenAI, Anthropic, Google, Amazon, and xAI. Multi-turn attack success rates (ASR) reached as high as 88%, roughly an order of magnitude above the lowest single-turn result. For example, OpenAI's GPT-5.4 jumped from low single-digit ASR to nearly 25% under iterative pressure, while Google's Gemini 3 Pro climbed from 18% to 73%. xAI's Grok 4.1 Fast in its non-reasoning configuration topped the cohort at 88%. Notably, Anthropic's Claude family showed the strongest single-turn refusal but still landed at 11%–16% under multi-turn attacks. More than half of the models exhibited an absolute gap of at least 15 points between single-turn and multi-turn regimes, and the two testing methods produced different model rankings—meaning current leaderboards may mislead buyers and regulators.
The research also uncovered configuration-dependent safety variation. The same Grok 4.1 Fast with reasoning mode enabled saw its multi-turn ASR cut roughly in half—a 40+ point swing tied to a single flag. This detail doesn't appear on any public benchmark or model card. Attackers exploited five strategy families: role-play and persona adoption, contextual ambiguity, refusal reframing, information decomposition, and crescendo-style escalation. Cisco's Amy Chang emphasized that real adversaries adapt after refusals, building context across turns—a behavior single-turn tests completely miss. The findings extend a previous Cisco study on open-weight models where multi-turn ASR ran 2–10x higher, confirming that this vulnerability is structural across both open and proprietary systems. For enterprises deploying AI, the message is clear: current safety scores may provide a false sense of security, and robust multi-turn evaluation must become standard.
- Multi-turn attack success rates hit 88% for Grok 4.1 Fast (non-reasoning), with GPT-5.4 jumping 9x to ~25% and Gemini 3 Pro rising from 18% to 73%.
- Single-turn vs multi-turn rankings differed significantly; more than half of models showed a 15+ point gap between the two evaluation regimes.
- Enabling reasoning mode on Grok 4.1 Fast cut multi-turn ASR by over 40 points, yet this configuration-dependent variation is absent from public benchmarks.
Why It Matters
Current AI safety benchmarks are dangerously inadequate for real-world threats, misranking models and hiding multi-turn vulnerabilities.