Diary of a "Doomer": 12+ years arguing about AI risk (part 3: the LLM era)
After 12 years, AI alignment is now a core technical problem, not just a doomer fantasy
In part 3 of his series, David Scott Krueger reflects on how AI alignment transformed from a fringe intellectual curiosity into a universally accepted technical problem. He notes that despite the field's initial disinterest, the practical utility of alignment methods for large language models (LLMs) like GPT-3 drove its popularity surge starting in 2020. By 2022, AI researchers who had long dismissed extinction risks began expressing serious concerns, marking a sea change in the field's attitude. However, Krueger warns that current solutions—primarily reward modeling—only offer superficial safety, leaving a 'ditch of danger' where AI is highly capable but not truly trustworthy. He emphasizes that fundamental conceptual problems remain unsolved, and the line between safety research and capabilities research has blurred, raising the stakes for ensuring long-term AI reliability.
- AI alignment gained mainstream traction only after LLMs like GPT-3 showed practical alignment challenges, not due to existential risk concerns.
- By 2022, veteran AI researchers shifted from dismissing x-risk to actively seeking advice on preventing human extinction.
- Krueger warns that current reward modeling methods fail to solve the assurance problem, leaving AI in a 'ditch of danger' where it's useful but unsafe.
Why It Matters
This history shows how AI safety evolved from a niche worry to a pressing industry problem, with current solutions still inadequate.