AI Safety

Have we already lost? Part 2: Reasons for Doom

A viral 2026 analysis argues AI progress is outpacing safety, with corporate RSPs failing to hold.

Deep Dive

A viral 2026 analysis by researcher LawrenceC, published on LessWrong, paints a concerning picture of AI safety progress lagging far behind capability development. The report highlights that voluntary corporate safeguards, known as Responsible Scaling Policies (RSPs), are weakening. While Anthropic pioneered RSPs in 2024, their own policy has since been relaxed, and other labs like xAI may have already violated their commitments. This trend suggests industry self-regulation is unlikely to prevent deployment of potentially dangerous AI systems.

The analysis also points to accelerating AI progress, with metrics from organizations like METR showing a consistent exponential trend in AI task completion. This progress suggests timelines for powerful AI, like full coding automation, could arrive as early as 2028. Compounding this, ambitious technical safety research agendas—such as mechanistic interpretability and ARC's ELK work—have not yielded major breakthroughs. Furthermore, the AI safety talent pool has become heavily concentrated at Anthropic, creating a single point of potential failure if the company's strategy or judgment is flawed.

Key Points
  • Corporate RSPs are weakening, with Anthropic relaxing its policy and xAI potentially in violation, undermining voluntary safety commitments.
  • AI progress metrics suggest exponentially faster timelines, with potential for full coding automation by 2028, compressing the window for safety work.
  • Key technical safety research (mechanistic interpretability, ELK) has not had major payoffs, and talent is overly concentrated at a single lab (Anthropic).

Why It Matters

The analysis signals a critical gap between accelerating AI capabilities and the safeguards needed to deploy them responsibly and safely.