Research & Papers

Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation

arXiv cs.LG April 20, 2026

⚡Study shows once an AI starts hallucinating, it's 2.6x harder to correct than to cause.

Deep Dive

A new research paper titled 'Hallucination as Trajectory Commitment' provides a mechanistic, causal explanation for why large language models (LLMs) like Qwen2.5-1.5B hallucinate and why it's so difficult to stop. The study, led by G. Aytug Akarlar, uses a novel 'same-prompt bifurcation' method to isolate the moment a model's generation goes wrong. By repeatedly feeding the same prompt, researchers observed that in 44.3% of cases, the model spontaneously diverged into either a factual or hallucinated path at the very first generated token. This indicates hallucination is not a gradual drift but an early, probabilistic commitment.

Crucially, the research reveals a profound asymmetry in the dynamics. Using activation patching—a technique to inject neural activity from one run into another—the team found that injecting a 'hallucinated' activation into a correct trajectory corrupted the output 87.5% of the time. However, the reverse process, injecting a 'correct' activation into a hallucinating trajectory, only recovered the right answer 33.3% of the time. This shows the hallucinated path is a stable 'attractor basin'; once the model falls in, it's 2.6 times harder to escape than it was to fall in. Furthermore, the model's propensity to hallucinate on a given prompt is predictable from its internal state before it even generates a word, with a correlation of r=0.776.

The findings frame hallucination not as random noise but as a structured, regime-like failure. The 'basin of attraction' for wrong answers is locally stable, meaning the model actively resists correction. This explains why simple post-hoc fixes often fail and suggests that effective mitigation may require coordinated, multi-step interventions rather than single-point corrections. The research provides a new vocabulary and causal framework for diagnosing and potentially designing more robust AI systems.

Key Points

44.3% of tested prompts caused the Qwen2.5-1.5B model to bifurcate into factual or hallucinated paths at the very first token.
Correcting a hallucination in progress was only 33.3% effective, while causing a hallucination was 87.5% effective, revealing a strong asymmetry.
The model's internal state before generation (step-0) predicts its hallucination rate with a Pearson correlation of r=0.776, meaning the tendency is baked in early.

Why It Matters

This explains why stopping AI hallucinations is so hard and shifts the focus to early intervention in the generation process.

Read Original Article

Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation

Why It Matters

Stay Ahead in AI