AI Safety

"We've been fine before, so we'll be fine again" is a fallacy (in the more dangerous direction).

New simulation reveals humanity's 'alive-tinted glasses' dangerously skew our perception of catastrophic threats.

Deep Dive

A new analysis published on LessWrong by Chapin Lenthall-Cleary uses computational modeling to challenge a dangerous cognitive bias in existential risk assessment. The article targets the common fallacy that humanity's past survival through crises like nuclear war or famine logically guarantees our future safety. The author argues this reasoning is flawed because we only observe histories where disasters didn't wipe us out—we are wearing 'alive-tinted glasses.' This creates an 'anthropic shadow,' where the observed rate of past disasters is not representative of the true underlying risk, especially for extinction-level events.

To quantify this effect, the author built 'toy models' simulating 100,000 worlds, each starting with 10 people and subject to random disasters of varying severity. By sampling from the perspective of an average observer within a surviving world, the simulation demonstrates a 'suppression effect': observers consistently and significantly underestimate the true probability of future catastrophic events. The effect is most pronounced for existential risks—those that could end a world entirely—because by definition, no observers exist in worlds where such disasters occurred to report on them.

The implications are stark for fields like AI safety and catastrophic risk management. The modeling suggests that our intuitive, history-based confidence is mathematically unsound. Professionals assessing risks from technologies like AGI (Artificial General Intelligence), engineered pathogens, or other frontier technologies cannot rely on the 'we've always been fine before' heuristic. The article connects this to established concepts like the Doomsday Argument, urging a more rigorous, probability-based approach to safeguarding humanity's future against low-probability, high-consequence events.

Key Points
  • The 'anthropic shadow' effect means survivors inherently underestimate true disaster risks, as they only observe non-catastrophic histories.
  • Simulations of 100,000 worlds show observers systematically misestimate disaster probabilities, with error worsening for more severe events.
  • The analysis directly challenges complacency in existential risk fields like AI safety, where past survival is a poor predictor.

Why It Matters

For AI safety researchers and policy makers, this model warns that intuitive risk assessments based on historical precedent are fundamentally flawed and dangerous.