AI Safety

MHC Interp #1: Previous-Token Heads Become Attention Sinks Under Manifold-Constrained Hyper-Connections

LessWrong AI May 03, 2026

⚡Standard detection methods fail as previous-token heads morph into high-kurtosis receivers

Deep Dive

A new interpretability study on DeepSeek's manifold-constrained hyper-connections (mHC) reveals that the architecture fundamentally shifts how attention heads function. Using 781M-parameter models trained with the mhc-lite repository, the author compared a full mHC model, an mHC-lite variant, and a baseline transformer without mHC. The core finding: previous-token heads—normally detected by diagonal stripe scores—become attention sinks with extremely high kurtosis in mHC models. Standard probing fails to locate them; instead, researchers must use ablation and path patching. These heads effectively turn into 'receiver heads' that absorb information from the residual stream.

Other attention head types also shift positions. Induction heads appear in earlier layers in mHC models compared to the baseline, while duplicate heads appear later. The two variants of mHC—full (using Sinkhorn-Knopp for doubly stochastic weights) and lite (using Birkhoff-von Neumann)—behave differently: mHC-lite's three residual streams each output distinct top-1 tokens, whereas all streams in full mHC converge on the same prediction. The logit lens analysis further confirms that mHC alters the residual stream's role in token prediction. These findings have direct implications for mechanistic interpretability, as existing tools for identifying circuits (e.g., for induction or previous-token heads) must be adapted for mHC-based models like DeepSeek v4.

Key Points

Previous-token heads in mHC models act as attention sinks with high kurtosis, undetectable by standard diagonal stripe scores.
Induction heads appear earlier (shallower layers) in mHC models, while duplicate heads shift to later layers.
Full mHC (Sinkhorn-Knopp) makes all residual streams predict the same top-1 token; mHC-lite (Birkhoff-von Neumann) outputs different tokens per stream.

Why It Matters

mHC architecture changes fundamental attention patterns, requiring new interpretability tools for model safety and debugging in DeepSeek v4.

Read Original Article

MHC Interp #1: Previous-Token Heads Become Attention Sinks Under Manifold-Constrained Hyper-Connections

Why It Matters

Stay Ahead in AI