Human intuitions about free will make the manipulation-vs-guidance distinction incoherent?

Human intuitions about free will make the manipulation-vs-guidance distinction incoherent.

The author's proposed AGI motivation system (Sympathy + Approval Reward) fails to solve this?

The author's proposed AGI motivation system (Sympathy + Approval Reward) fails to solve this.

No existing approach (empowerment, corrigibility, agency) offers a clear technical path forward?

No existing approach (empowerment, corrigibility, agency) offers a clear technical path forward.

AI Safety

AI alignment struggles: No 'True Name' for manipulation vs guidance

AI Alignment Forum May 12, 2026

⚡Human intuitions about free will make the problem unsolvable, researcher argues.

Deep Dive

The post tackles a central alignment problem: how to ensure AIs help humans without manipulating their goals. The author argues that while concepts like corrigibility, empowerment, and agency seem intuitive, they rely on an incoherent human ontology of free will. People naturally distinguish 'good' guidance from 'bad' manipulation, but this distinction crumbles under scrutiny—human desires are under-determined and malleable, making any principled boundary impossible without invoking flawed folk psychology.

The author had hoped to solve this by building AGI with a prosocial motivation system mixing consequentialist 'Sympathy Reward' (maximizing pleasure) and virtue-ethics 'Approval Reward' (internalizing social norms). However, this combination faces the Nearest Unblocked Strategy problem, and the inability to define manipulation threatens both ingredients. The post concludes that a 'True Name' for manipulation probably doesn't exist for technical alignment, leaving the field without a clear solution.

Key Points

Human intuitions about free will make the manipulation-vs-guidance distinction incoherent.
The author's proposed AGI motivation system (Sympathy + Approval Reward) fails to solve this.
No existing approach (empowerment, corrigibility, agency) offers a clear technical path forward.

Why It Matters

This undermines the search for safe AGI that respects human autonomy without manipulation.

Read Original Article

AI alignment struggles: No 'True Name' for manipulation vs guidance

Why It Matters

Related Articles

🚀 Stay Ahead in AI