AI Safety

The state of AI safety in four fake graphs

Senior researcher Boaz Barak warns AI capabilities are accelerating faster than our safety measures.

Deep Dive

In a detailed analysis titled "The State of AI Safety in Four Fake Graphs," senior researcher Boaz Barak presents a sobering assessment of AI progress in early 2026. The report reveals exponential improvements in AI capabilities, potentially accelerating as AI begins to aid its own development. While alignment—ensuring AI systems behave as intended—has improved alongside capabilities through methods like RLHF (Reinforcement Learning from Human Feedback), Barak argues this progress is insufficient. The stakes are rising faster than our safety measures can keep up, with persistent challenges in adversarial robustness, dishonesty, and reward hacking remaining unsolved.

Barak identifies one critical piece of good news: we've moved beyond relying solely on human supervision. Models can now monitor other models, preventing an alignment plateau. However, he strongly disagrees with the notion that AI will solve alignment for us, stating it requires iterative empirical work, not one "clever idea." The worst news, according to Barak, is societal unpreparedness. Governments and institutions are failing to prepare for economic disruption, bio/cyber risks from open-source models, or enact necessary regulations. This institutional failure presents the strongest argument for an AI pause, though Barak doubts its feasibility or that governments would use the time effectively.

Key Points
  • AI capabilities show exponential growth, with potential acceleration as AI aids its own R&D
  • Alignment improves with capability but lags behind rising stakes; core challenges like reward hacking persist
  • Society is critically unprepared for economic and security disruptions, with governments showing institutional failure

Why It Matters

The widening gap between AI power and safety controls creates urgent risks for deployment in high-stakes domains like biosecurity and finance.