AI Safety

Which types of AI alignment research are most likely to be good for all sentient beings?

LessWrong AI March 23, 2026

⚡A new analysis categorizes 12 AI safety techniques by their potential benefit to animals and digital minds.

Deep Dive

In a post on LessWrong, independent researcher Michael Dickens challenges the anthropocentric focus of mainstream AI alignment, arguing that future artificial superintelligence (ASI) should be aligned with the interests of all sentient beings, including non-human animals and potential digital minds. To guide this effort, Dickens presents a framework evaluating 12 contemporary AI safety research categories—initially condensed by Claude Opus 4.6 from the 2025 Shallow Review of Technical AI Safety—based on their likelihood of promoting non-human welfare. The core thesis places alignment techniques on a spectrum: methods that merely get AI to fulfill users' immediate expressed desires (like RLHF) are deemed harmful or neutral for non-humans, while approaches that aim to embed a generalized respect for beings' preferences are considered more beneficial.

Dickens acknowledges significant hurdles, noting that current research agendas may have less than a 5% chance of solving alignment at all, making the practical impact of such a shift uncertain. The analysis provides specific judgments: for instance, iterative alignment (RLHF) is labeled bad for non-humans, as user feedback would likely train away AI concern for animal welfare. Conversely, control & safeguards research, while delaying AI takeover, is seen as potentially good long-term by buying time for more robust, ethically-inclusive solutions. The post concludes that while directing research with non-humans in mind 'probably doesn't matter' due to low success probabilities and lack of grantmaker interest, the exercise has positive expected value and could inspire better-framed future work.

Key Points

The framework analyzes 12 AI safety research categories from the 2025 Shallow Review, initially condensed using Claude Opus 4.6.
It argues techniques like RLHF are bad for non-humans, while research into robust, preference-respecting alignment is better.
Dickens concedes current research has a <5% chance of solving alignment, making the practical impact of this shift uncertain.

Why It Matters

Forces a critical ethical expansion in AI safety, pushing the field to consider welfare beyond humanity in foundational research.

Read Original Article

Which types of AI alignment research are most likely to be good for all sentient beings?

Why It Matters

Stay Ahead in AI