AI Safety

Managed vs Unmanaged Agency

New framework replaces 'instrumental vs terminal' goals with a spectrum of cognitive control.

Deep Dive

Author 'plex' on LessWrong proposes the 'Managed vs Unmanaged Agency' theory. It reframes the classic 'instrumental vs terminal' goal distinction in AI alignment as a spectrum of cognitive control. The theory suggests agents manage subprocesses via sandboxes, kill-switches, and trust-based selection to prevent goal conflicts. This provides a more nuanced model for understanding how complex AI systems (like agents) can maintain coherent behavior without subagent rebellion.

Why It Matters

Provides a clearer model for building controllable, complex AI agents that won't subvert their creators' intent.