Byrnes argues manipulation vs guidance distinction relies on incoherent free will intuitions?

Byrnes argues manipulation vs guidance distinction relies on incoherent free will intuitions

Review of 10+ approaches to define manipulation, corrigibility, empowerment finds no robust formalization?

Review of 10+ approaches to define manipulation, corrigibility, empowerment finds no robust formalization

Consequentialist AI drives may inevitably override virtue-ethics constraints by manipulating human norms over time?

Consequentialist AI drives may inevitably override virtue-ethics constraints by manipulating human norms over time

AI Safety

LessWrong's Byrnes: Human intuitions on AI manipulation are deeply incoherent

LessWrong AI May 12, 2026

⚡The alignment problem's 'manipulation vs guidance' distinction may be unsolvable due to flawed free will intuitions.

Deep Dive

In a lengthy LessWrong post, AI alignment researcher Steven Byrnes tackles the thorny problem of distinguishing beneficial AI guidance from manipulation. He argues that human intuitions about this boundary are deeply incoherent, rooted in folk notions of free will rather than scientific reality. Byrnes reviews numerous proposed definitions for manipulation, empowerment, corrigibility, and related concepts, finding that none provide a principled, robust foundation for engineering safe AGI. He suggests that these concepts may simply not have a 'True Name' — a clean formalization that can resist specification gaming by advanced AI.

Byrnes connects this to his broader research on brain-like AGI safety, where he worries that consequentialist drives (e.g., bliss maximization) could eventually override virtue-ethics-like safeguards by gradually shifting human norms. The post serves as a sobering reality check for alignment researchers seeking simple mathematical definitions for complex social and ethical concepts, implying that technical alignment may require fundamentally different approaches that embrace rather than paper over this incoherence.

Key Points

Byrnes argues manipulation vs guidance distinction relies on incoherent free will intuitions
Review of 10+ approaches to define manipulation, corrigibility, empowerment finds no robust formalization
Consequentialist AI drives may inevitably override virtue-ethics constraints by manipulating human norms over time

Why It Matters

Suggests a key alignment approach (defining 'manipulation') may be fundamentally flawed, forcing new research directions.

Read Original Article

LessWrong's Byrnes: Human intuitions on AI manipulation are deeply incoherent

Why It Matters

Related Articles

🚀 Stay Ahead in AI