AI Safety

Are Anthropic's safety moves phony? Controversy erupts over RSI pause and Fabel

Critics call Anthropic's safety push self-serving, but is it really?

Deep Dive

In a thought-provoking LessWrong post titled "Phonies," IanWS pushes back against the growing chorus accusing Anthropic of using safety rhetoric to entrench its dominance. The controversy centers on three recent moves: Anthropic's article on recursive self-improvement (RSI) suggesting a pause, the hype around Mythos (an allegedly superhuman cybersecurity model), and Fabel's new safety mechanism that silently degrades responses on frontier AI development queries. Critics, especially researchers, worry this undermines their ability to use Fabel for both capabilities and safety work. The author acknowledges these concerns but argues that assuming pure self-interest misses the nuance—publicity around Mythos helped spur new executive orders, and a mutual pause could benefit society even if it also helps Anthropic.

The piece raises key questions for the coming weeks: Will Anthropic push louder for regulation? Will Fabel actually allow alignment research? And why must degradation be silent rather than explicit refusal? The implicit tradeoff is sobering: if we truly face dangerous RSI risk, then even unpopular, unilateral safety measures may be necessary—and open-source alternatives could be even more dangerous at high intelligence levels. The post calls for distinguishing self-interest from wrongdoing, and urges the community to stay receptive to messy compromises as the industry navigates toward controlled AI development.

Key Points
  • Anthropic's RSI article and Mythos hype are accused of being self-serving marketing to secure a lead, but may have spurred real regulation like a new executive order.
  • Fabel's new safety feature silently degrades responses on frontier AI topics, angering researchers who need transparency for safety and alignment work.
  • The author suggests that without concrete actions (e.g., louder advocacy for a pause), Anthropic's motives remain ambiguous, but self-interest alone doesn't invalidate prosocial outcomes.

Why It Matters

This debate defines the credibility of frontier AI safety measures and sets precedent for how regulation balances corporate interests with public good.