Question: Why is the goal of AI safety not 'moral machines'?
A viral LessWrong post questions why AI safety focuses on extinction risks instead of building ethical AI systems.
A viral discussion on LessWrong, sparked by user Mordechai Rorvig, questions why the AI safety field doesn't frame its primary goal as creating 'moral machines'—AI systems with strong, reliable ethical reasoning. Instead, the field typically focuses on preventing extinction risks and protecting human wellbeing through technical alignment. Rorvig argues that the moral machine framing could resolve confusion and synthesize concepts like emergent misalignment, noting that we use similar frameworks when educating humans. The post references related discussions by researchers like Richard Ngo and johnswentworth about whether AI should be aligned to virtues.
The top response from user williawa explains that AI safety researchers generally avoid the 'moral machines' framing due to two related beliefs: moral anti-realism (the view that human values aren't objectively special) and a weak form of the orthogonality thesis (that machines can pursue any goal we specify). This perspective suggests that solving alignment means making AI follow any arbitrary goal, not specifically human morality. The responder uses an analogy about launching rockets to random stars versus specifically targeting the sun, arguing that the general problem is more fundamental. This debate reveals deep philosophical divides within the AI safety community about whether human morality should be central to alignment research or treated as just one possible goal specification.
- The debate centers on whether AI safety should aim for 'moral machines' with ethical reasoning versus preventing specific catastrophic risks
- Top response cites moral anti-realism and the orthogonality thesis as reasons the field avoids morality framing
- Reveals fundamental philosophical divides in how researchers approach the alignment problem between technical and ethical frameworks
Why It Matters
This debate shapes how billions are spent on AI safety research and what technical approaches get prioritized.