AI Safety

If you don't feel deeply confused about AGI risk, something's wrong

Dave Banerjee argues 75+ AI governance fellows lack first-principles understanding of existential threats.

Deep Dive

In a viral LessWrong post, Dave Banerjee critiques the current state of AI safety and governance training programs. Drawing from conversations with approximately 75 fellows across five major programs (ERA, IAPS, GovAI, LASR, Pivotal), he argues that the fellowship model encourages shallow understanding of AGI risks. The typical 8-12 week structure creates intense time pressure, forcing fellows to 'sprint sprint sprint' on mentor-driven projects rather than interrogating foundational assumptions. Banerjee claims many researchers cannot coherently walk through AI x-risk threat models at a 'gears level' or simulate top alignment researchers' worldviews. He identifies structural issues including mentorship incentives that prioritize being a 'good mentee' over questioning frameworks, and recommends practitioners develop first-principles understanding through minimal-trust investigations and learning by writing.

Key Points
  • Based on 75+ conversations across 5 major AI safety fellowships (ERA, IAPS, GovAI, LASR, Pivotal)
  • Claims many researchers can't reconstruct AI x-risk arguments without appealing to authority
  • Identifies structural issues: 8-12 week time pressure and mentor-driven project incentives

Why It Matters

Questions whether current AI safety training produces researchers who truly understand the existential risks they're trying to mitigate.