A Gauge Theory of Superposition: Toward a Sheaf-Theoretic Atlas of Neural Representations
A new framework uses gauge theory to measure three concrete obstructions to interpreting large language models.
A new theoretical paper titled 'A Gauge Theory of Superposition: Toward a Sheaf-Theoretic Atlas of Neural Representations' proposes a novel mathematical framework for understanding the complex phenomenon of superposition in large language models. Authored by Hossein Javidnia, the work moves beyond the traditional view of a single global feature dictionary, instead modeling the LLM's internal representations as a 'sheaf-theoretic atlas'—a collection of local semantic charts organized by context. This framework formally defines three specific, measurable obstructions that prevent a globally consistent interpretation of model features: local jamming (when active features exceed capacity), proxy shearing (a mismatch in feature correspondence), and nontrivial holonomy (path-dependent feature transformations).
The paper demonstrates the framework's utility by proving four concrete results on a frozen Llama 3.2 3B Instruct model, tested on datasets including WikiText-103 and C4. Key findings include a method to compute gauge-invariant holonomy, a data-dependent lower bound on transfer mismatch energy from shearing, and the achievement of non-vacuous certified bounds for feature interference. This work provides a rigorous, quantitative toolkit for researchers aiming to diagnose the fundamental limits of interpretability in modern neural networks, shifting the conversation from qualitative observation to measurable, theory-backed obstruction.
- Proposes a 'sheaf-theoretic atlas' to replace the single global feature dictionary, clustering contexts into a stratified complex.
- Defines three concrete, measurable obstructions to interpretability: local jamming, proxy shearing, and nontrivial holonomy.
- Proves four results on a Llama 3.2 3B model, providing non-vacuous certified bounds for feature interference and jamming.
Why It Matters
Provides a rigorous mathematical framework to measure and bound the fundamental limits of interpreting how LLMs work internally.