How Attention Sinks Emerge in Large Language Models: An Interpretability Perspective
New research reveals how LLMs fixate on the first token, forming a 'sink circuit' in just two transformer layers.
A team of researchers has published a new paper on arXiv, providing a crucial interpretability perspective on a known quirk of Large Language Models (LLMs). The study investigates 'attention sinks'—the phenomenon where models allocate disproportionate attention to specific tokens, particularly the very first token in an input sequence. While often seen as a bug, this persistent focus on position zero has structural implications for a wide range of downstream applications, yet its underlying mechanism has remained poorly understood.
The researchers traced the formation of this specific sink and identified a surprisingly simple mechanism they call the 'P0 Sink Circuit'. This circuit enables a model to recognize the token at position zero and induce an attention sink within just the first two transformer blocks, operating independently of any semantic information in the token itself. This circuit forms the foundational basis for the positional bias.
By analyzing detailed training traces from a massive 30-billion-parameter A3B Mixture-of-Experts (MoE) model trained from scratch, the team made a key discovery: this sink mechanism emerges very early in the training process. Furthermore, it becomes increasingly concentrated in the model's first two layers as training progresses. This concentration suggests that the behavior of this circuit could serve as a novel signal for tracking the convergence state during the pre-training phase, offering a new internal metric for model developers.
- Identified the 'P0 Sink Circuit', a simple mechanism causing LLMs to fixate on the first input token within two transformer blocks.
- Analysis of a 30B A3B MoE model's training traces shows the circuit emerges early and concentrates in the first two layers.
- The circuit's development may provide a new internal signal for tracking pre-training convergence, independent of semantic content.
Why It Matters
Understanding this core LLM mechanism is vital for improving model efficiency, interpretability, and could lead to better training diagnostics.