AI Safety

Tactics for Denying Your Motivations, or Why Legibility is Expensive

A viral LessWrong post argues that understanding a system's true motives can break it.

Deep Dive

A thought-provoking essay titled 'Tactics for Denying Your Motivations, or Why Legibility is Expensive' has gained traction on the rationality forum LessWrong. Authored by user Dentosal, the piece argues that many functional systems, including our own psychology, rely on participants not fully understanding their inner workings. The central thesis is that an accurate description of a system can be an attack on it, destabilizing the very mechanisms that allow it to operate. This concept, termed 'legibility,' becomes expensive to maintain because it requires building systems robust enough to withstand full adversarial scrutiny.

The essay draws parallels between personal defense mechanisms and institutional practices, citing examples like banks intentionally blinding themselves to the specific reasons for filing Suspicious Activity Reports (SARs) to maintain plausible deniability with customers. It outlines a hierarchy of defensive tactics, from simple counterattacks and pretending innocence to more advanced organizational strategies for 'manufactured innocence.' The post concludes by linking this to developmental psychology, suggesting that the highest stage of personal development involves running one's motivational systems in a 'self-adversarial-robust' way—a costly but resilient form of legibility.

Key Points
  • The essay posits that 'legibility'—making a system's true motivations transparent—is often destructive, acting as an attack that can cause the system to fail.
  • It details defensive tactics like counterattack and 'manufactured innocence,' using the example of banks obscuring SAR details to avoid customer confrontation.
  • The author connects this to Kegan's stages of adult development, where the highest stage involves creating motivations robust enough to withstand self-scrutiny.

Why It Matters

Offers a framework for understanding opacity in AI systems, corporate policies, and personal psychology, highlighting the cost of transparency.