AI Safety

LessWrong's Lorxus proposes mandatory AI safety readings from Lem to Bernal

A curated list of prescient sci-fi and philosophy for new AI safety researchers.

Deep Dive

Lorxus, in a LessWrong post titled 'If I Were Emperor of New AI Safety Researcher Training...', presents a curated curriculum of underutilized works for AI safety researchers. The author emphasizes readings that are brief, prescient, and independent of mainstream AI safety discourse. Recommendations include Stanislaw Lem's novellas 'A History of Bitic Literature' and 'Golem XIV' from 1973, which predicted large language models and AGI dynamics. Robnost's 'The Void' (2025) is cited for its clear explanation of the token-generation paradigm and critiques of current evals like Anthropic's.

The list also covers moral-philosophical grounding with J.D. Bernal's 'The World, the Flesh, and the Devil' (1929), which explores constraints on human wellbeing and power dynamics. Lorxus warns that many researchers are naive about sociotechnical power games. The post is targeted at all levels, from newcomers to established researchers, and aims to fill gaps in typical training that focuses on works like 'A List of Lethalities' or Bostrom's excerpts. Each recommended text is chosen for its specific insight into LLM psychology, shard theory, or the broader stakes of AI development.

Key Points
  • Recommends Stanislaw Lem's 1973 novellas that presciently predicted LLM interpolation and AGI shapes.
  • Robnost's 'The Void' explains token-generation paradigm, foundation models, and why some evals are counterproductive.
  • J.D. Bernal's 1929 text provides essential moral-philosophical grounding on power and societal constraints.

Why It Matters

A foundational reading list that could shape the thinking of new AI safety researchers.