AI Safety

Have we already lost? Part 1: The Plan in 2024

A viral AI safety post outlines why the community's 2024 strategy is faltering on multiple fronts.

Deep Dive

A viral post on the LessWrong forum, written by LawrenceC for the Inkhaven Residency, presents a sobering mid-2026 retrospective on the AI safety community's 2024 strategy. The author questions whether a 'point of no return' for safe AI development has been passed, concluding 'no' but highlighting a significantly worsened outlook. The core 'plan' from 2024 involved a three-step approach: using voluntary commitments (like RSPs) to buy development time, extracting cognitive labor from powerful-but-not-too-powerful AI to solve alignment, and converting that AI assistance into technical and policy solutions.

However, the post details multiple failures: governance and policy plans have largely collapsed, AI progress is on more aggressive timelines than anticipated, and the community has 'largely went all-in on Anthropic,' losing independence. The author notes that the least effort went into the final step of converting AI labor into concrete solutions, with many relying on a 'wing-it' approach. Despite the negative updates, reasons for optimism include better-than-expected progress on 'wing-it–style empirical alignment,' the potential for Anthropic to maintain a lead, and increased leverage from non-US governments. The post frames the current moment as a critical juncture requiring a reassessment of strategy.

Key Points
  • The 2024 AI safety plan relied on voluntary commitments (RSPs) to buy time and using AI assistance to solve alignment, but governance efforts have largely failed.
  • Key negative updates include accelerated AI progress timelines, over-reliance on a single company (Anthropic), and stalled technical research plans.
  • Despite the grim assessment, reasons for hope include progress on empirical alignment methods and potential geopolitical leverage from outside the US.

Why It Matters

Highlights the growing strategic challenges in ensuring AI safety as development accelerates, forcing a community reckoning.