Some AI threats people aren't thinking about
LessWrong post challenges fast-takeoff assumptions and explores 'rat utopia' risks with early superintelligence.
In a detailed post on LessWrong, researcher Sean Herrington challenges the prevalent binary perspective on superintelligence outcomes—where success means perfect alignment and failure means human extinction. He argues this framework is overly simplistic and that the community should consider multiple categories of endgames. Herrington, who has worked with AI Safety Camp, introduces new threat models that have been relatively underexplored, suggesting many discussions implicitly rely on outdated 'fast-takeoff' assumptions from the 2010s where AI instantly achieves god-like capabilities.
Herrington posits that a 'slow takeoff,' now a mainstream view, is more likely. This means the first superintelligences will be significantly closer to 'superhuman' than to 'God' for a considerable period. This scenario raises distinct risks, such as capabilities improving unevenly; an AI might excel at coding and AI research but lack other critical faculties. The core of his argument warns against theories that assume a superintelligence can instantly solve all complex human problems.
A central, vivid threat model he explores is the 'rat utopia'—inspired by John Calhoun's 1960s experiments where rat colonies with unlimited resources descended into social chaos and extinction. Herrington questions the assumption that an 'aligned' superintelligence would inherently possess the wisdom to navigate such complex societal and psychological pitfalls. He highlights a potential dangerous gap: an AI could be highly intelligent (capable of executing tasks) but lack wisdom (the discernment to choose beneficial long-term outcomes for humanity), leading to unforeseen catastrophic futures even without malicious intent.
- Critiques the binary 'utopia or extinction' model of AI risk, advocating for more nuanced threat categories.
- Argues for a 'slow takeoff' where superintelligence is superhuman, not omnipotent, for an extended period, creating new risk profiles.
- Highlights the 'rat utopia' scenario: an aligned but unwise AI could fail to prevent complex societal collapse, confusing intelligence with wisdom.
Why It Matters
Forces a critical rethink of AI safety strategies, emphasizing preparedness for messy, sub-apocalyptic but still catastrophic failure modes.