AI Safety

AI 2027 Tracker: One Year of Predictions vs. Reality

Independent analysis finds AI safety predictions arriving 1-2 years early while capability benchmarks lag behind.

Deep Dive

An independent analyst known as hauspost has published a one-year retrospective tracking 53 specific predictions from the influential 'AI 2027' scenario report against real-world developments. The tracker reveals a striking pattern: while overall progress aligns with the report's forecasts, safety and governance risks are materializing faster than the underlying capabilities that were supposed to enable them. For example, the report predicted an 'Agent-2' system would autonomously discover thousands of software zero-days in early 2027. In reality, Anthropic's Claude Mythos Preview achieved this as a side effect of training in 2025, roughly two years ahead of schedule. Similarly, predictions about DOD-AI lab dynamics are tracking early.

Conversely, predictions about raw AI capabilities are mostly running slightly behind. The report forecasted AI systems would achieve 85% on the SWE-bench coding benchmark by mid-2025, but the actual best result (Claude Opus 4.1) was 74.5%. Compute scale-ups and other benchmark timelines have also slipped. The tracker's methodology assigns six status levels (Confirmed, Ahead, On Track, Behind, Emerging, Not Yet Testable) and requires explicit evidence for status changes, with updates gathered through automated agent runs and approved manually. The project aims to turn vague discussions about AI risk into falsifiable, dated claims that are harder to dismiss.

Key Points
  • 51% of the 53 tracked predictions (27 total) are Confirmed, Ahead, or On Track after one year.
  • Safety risks like autonomous zero-day discovery (Claude Mythos) arrived ~2 years early, while capability benchmarks (SWE-bench) are slightly behind.
  • The tracker uses a rigorous methodology with six status levels and manual approval for all evidence-based updates.

Why It Matters

Provides concrete data showing AI governance and safety challenges are accelerating faster than technical capabilities, demanding urgent policy attention.