Managed vs Unmanaged Agency
New framework replaces 'instrumental vs terminal' goals with a spectrum of cognitive control.
Deep Dive
Author 'plex' on LessWrong proposes the 'Managed vs Unmanaged Agency' theory. It reframes the classic 'instrumental vs terminal' goal distinction in AI alignment as a spectrum of cognitive control. The theory suggests agents manage subprocesses via sandboxes, kill-switches, and trust-based selection to prevent goal conflicts. This provides a more nuanced model for understanding how complex AI systems (like agents) can maintain coherent behavior without subagent rebellion.
Why It Matters
Provides a clearer model for building controllable, complex AI agents that won't subvert their creators' intent.