Amazon researchers jailbroke Claude Fable 5 with the phrase 'fix this code,' sparking a White House panic?

Amazon researchers jailbroke Claude Fable 5 with the phrase 'fix this code,' sparking a White House panic.

Anthropic restored Fable after implementing classifiers that block the request in 99%+ of cases, but many coding tasks now fall back to Opus 4.8?

Anthropic restored Fable after implementing classifiers that block the request in 99%+ of cases, but many coding tasks now fall back to Opus 4.8.

Alex Stamos warns the tighter restrictions will push US cybersecurity startups to use Chinese models, handing PRC labs a strategic win?

Alex Stamos warns the tighter restrictions will push US cybersecurity startups to use Chinese models, handing PRC labs a strategic win.

AI Safety

Anthropic restores Claude Fable 5 after government lockdown over 'fix this code' jailbreak

LessWrong AI July 04, 2026

⚡A simple debugging request triggered a US government freeze on Anthropic's most advanced model.

Deep Dive

Anthropic's flagship model Claude Fable 5 has been fully restored after a two-week government-mandated shutdown triggered by Amazon researchers who discovered they could jailbreak the model by simply asking it to 'fix this code.' The White House initially freaked out and applied export controls on both Fable and the earlier Mythos model on June 12. Anthropic responded by expanding classifiers to refuse the specific request in over 99% of cases, which satisfied regulators. Controls on Mythos were lifted June 26, and Fable's full access returned July 1. However, the near-term safeguards are deliberately stupid — many routine coding and debugging tasks now fall back to Opus 4.8, as Anthropic admitted that debugging is literally 'fix this code.' The company is now working with its Glasswing partners (Amazon, Microsoft, Google) and the US government to create a systematic classification framework for jailbreaks, aiming to prevent future ad hoc political interventions.

Alex Stamos offered a sharp critique of the episode, noting that Anthropic's careful language buries a hard truth: the jailbreak provided no capabilities beyond what other models (including Chinese ones) already offer. The cost of this White House freakout is that US frontier models now must make a far more conservative precision-recall tradeoff on cybersecurity refusals. As a result, US models become significantly less useful for defensive cybersecurity work unless users join a restricted 'trusted group' — making it impossible to build products on these models for external customers. Stamos warns this is a big win for Chinese labs, as security companies and startups serving multiple clients will be driven to use PRC models instead. The entire episode underlines the urgent need for a competent scoring framework for jailbreaks, and the danger of letting political actors without technical understanding make snap decisions about frontier AI availability.

Key Points

Amazon researchers jailbroke Claude Fable 5 with the phrase 'fix this code,' sparking a White House panic.
Anthropic restored Fable after implementing classifiers that block the request in 99%+ of cases, but many coding tasks now fall back to Opus 4.8.
Alex Stamos warns the tighter restrictions will push US cybersecurity startups to use Chinese models, handing PRC labs a strategic win.

Why It Matters

The incident shows how political overreactions can degrade US model utility and inadvertently boost Chinese AI competitiveness.

Read Original Article

Anthropic restores Claude Fable 5 after government lockdown over 'fix this code' jailbreak

Why It Matters

Related Articles

🚀 Stay Ahead in AI