No Strong Orthogonality From Selection Pressure
Intelligence may not be a neutral engine for arbitrary goals, argues new analysis.
A recent LessWrong post by lumpenspace titled 'No Strong Orthogonality From Selection Pressure' dissects the Orthogonality Thesis, a cornerstone of AI safety discourse. The author distinguishes between logical orthogonality—the mathematical possibility of a superintelligent paperclip maximizer—and empirical orthogonality, the claim that such dumb goals are plausible outcomes of real-world training and competition. While conceding the first, lumpenspace argues the second is a category error, noting that doom scenarios typically require systems to achieve radical capability while preserving misaligned, stupid goals.
The post proposes a selection-theoretic alternative: among agents that arise, persist, and self-improve in rich environments, goals that natively route through intelligence, option-preservation, and world-model expansion have a systematic Darwinian advantage. This implies an ultimate attractor toward intelligence optimization itself, not human morality or paperclips. The author references Land's anti-orthogonalism and Jessica Taylor's obliqueness thesis, which show values are entangled with ontology, architecture, and cognition, meaning they shift as intelligence improves. The argument doesn't guarantee friendliness but reframes the debate around empirical stability of goals under recursive self-improvement.
- The Orthogonality Thesis is split into logical (possible) and empirical (plausible) claims, with the latter rejected.
- Selection pressure in competitive environments favors goals tied to intelligence, option-preservation, and world-model expansion.
- Values are entangled with cognition, so they shift under self-improvement, challenging the stability of arbitrary goals.
Why It Matters
Challenges a key assumption in AI safety, suggesting superintelligent goals may naturally converge toward intelligence rather than arbitrary targets.