AI Safety

“Alignment” and “Safety”, part one: What is “AI Safety”?

A viral post clarifies why experts now prefer 'alignment' over 'safety' for existential risks.

Deep Dive

In a viral explainer, AI safety researcher Richard Ngo dissects the confusing terminology wars within the AI risk community. He traces how the term 'AI safety' was strategically adopted around 2015 to broaden the appeal of existential risk (x-risk) concerns by including near-term, tangible problems like preventing self-driving car accidents. This move, exemplified by papers like 'Concrete Problems in AI Safety,' aimed to legitimize the field but inadvertently created a muddled discourse where ethicists, safety engineers, and x-risk researchers were often conflated.

Ngo highlights the resulting push for clearer language, leading to the rise of 'AI alignment' as the preferred term for the specific technical challenge of ensuring superintelligent AI systems pursue human-compatible goals and avoid catastrophic outcomes. This terminological shift from 'long-term safety' to 'alignment' also rejects the outdated assumption that existential threats are distant, acknowledging that advanced, potentially risky AI could arrive within years. The post clarifies the distinct communities: AI ethics (focusing on fairness/bias), near-term AI safety (reliability), and AI x-safety/alignment (existential risk).

The evolution reflects the field's rapid maturation as theoretical concerns become pressing engineering problems. Ngo's analysis, part one of a two-part series, provides crucial context for professionals navigating a landscape where 'safety' can mean preventing a chatbot's bias or preventing human extinction, and where the community's preferred labels have changed dramatically within a five-year PhD cycle.

Key Points
  • The term 'AI safety' was adopted around 2015 as a strategic broadening to include near-term issues like autonomous vehicle crashes, diluting its focus on existential risk (x-risk).
  • Confusion between AI ethics, near-term safety, and existential safety led to the community increasingly adopting 'AI alignment' to specifically denote technical work on preventing catastrophic outcomes from superintelligent AI.
  • The terminology shift rejects the old 'short-term vs. long-term safety' framing, acknowledging that advanced AI with existential risk potential may arrive within years, not decades.

Why It Matters

Clarifying these terms is essential for effective policy, research funding, and public discourse as AI capabilities accelerate toward potentially world-altering systems.