Your Causal Variables Are Irreducibly Subjective
New paper challenges mechanistic interpretability's core assumption, claiming variable definition is inherently subjective.
David Reber's paper 'Your Causal Variables Are Irreducibly Subjective' delivers a foundational critique of current mechanistic interpretability research on large language models (LLMs). The core argument is that the entire causal inference toolkit—used to explain how models like GPT-4 or Llama 3 work—rests on a subjective, pre-formal step: defining the causal variables themselves. Whether a researcher labels a circuit as causing 'truthfulness' or a specific 'reasoning step,' that definition is a choice that shapes the entire hypothesis space. Reber argues this irreducible subjectivity is often masked by formal statistical methods, creating what he calls 'vibes dressed up as formalism.' The paper contends that attempts to fully formalize away this subjectivity are illusory and that progress requires embracing it.
Reber draws parallels to epidemiology and John Snow's 1854 cholera investigation, advocating for a 'shoe leather era' in AI interpretability. This means prioritizing painstaking, reproducible fieldwork—like meticulously labeling model components—over purely statistical sophistication. The implications are severe for researchers trying to build causal models of attention heads or neuron activations. It suggests that without clear, agreed-upon definitions for variables (what exactly is an intervention on 'truthfulness'?), causal claims about LLMs remain fundamentally limited and subjective. The paper concludes that the best path forward is to make variable definitions subjective but explicitly reproducible, focusing on falsifying specific, carefully defined hypotheses rather than seeking universal formal truths.
- Causal inference in LLM interpretability requires well-defined variables (e.g., 'truthfulness'), which is a subjective, pre-formal step the formalism itself cannot provide or validate.
- Every choice of variables defines a different hypothesis space, making the space of possible causal models 'almost incomprehensibly vast' and limiting what any analysis can claim.
- The paper calls for a 'shoe leather era' focused on reproducible labeling processes and explicit assumptions, arguing this embraces subjectivity rather than masking it with statistical formalism.
Why It Matters
Challenges the scientific foundation of how we explain AI models, pushing interpretability research toward more rigorous, reproducible fieldwork over statistical sophistication.