How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing
New study shows transformers don't just fail—they actively rotate away from correct answers when given false premises.
A new research paper by Javier Marín, titled 'How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing,' provides a groundbreaking look inside transformer-based language models. Using a novel method called forced-completion probing, the study tracked geometric measurements across every layer of four decoder-only models ranging from 1.5B to 13B parameters. The core discovery is that when a model processes a query with a known incorrect single-token continuation, the internal representations of correct and incorrect answers diverge not through changes in magnitude, but through rotation. The displacement vectors maintain near-identical lengths while their angular separation increases, meaning factual selection is encoded in direction on an approximate hypersphere.
Crucially, the research reveals that models don't passively fail when given wrong information; they actively work against it. The network drives the internal probability away from the correct token, actively suppressing the right answer. Furthermore, this sophisticated behavior is not present in smaller models and emerges as a distinct 'phase transition' at around 1.6B parameters. This threshold suggests a minimum scale is required for this type of active factual constraint processing. The findings challenge previous static interpretations of truthfulness in AI and highlight that key reasoning dynamics are invisible to single-layer probe techniques, pointing toward more rotational, geometric analyses for understanding model intelligence.
- Models separate right/wrong answers via rotation on a hypersphere, not by scaling vector magnitudes.
- Transformers actively suppress the correct answer when fed false premises, showing an 'active' rejection mechanism.
- This capability emerges as a phase transition at ~1.6B parameters, absent in smaller models.
Why It Matters
Provides a new geometric framework for interpreting AI reasoning, crucial for improving model truthfulness and debugging hallucinations.