On the Complexity of Neural Computation in Superposition
New paper establishes first theoretical limits on AI model compression, showing exponential gap between representation and computation.
Computer scientists Micah Adler and Nir Shavit have published groundbreaking research establishing the first theoretical lower bounds for neural networks computing in superposition—the phenomenon where networks represent more features than they have neurons. Their paper "On the Complexity of Neural Computation in Superposition" proves that for broad problem classes including permutations and pairwise logical operations, computing m' features requires at least Ω(√m' log m') neurons and Ω(m' log m') parameters. This work provides the first subexponential bound on neural network capacity, showing that a network with n neurons can compute at most O(n²/log n) features, creating a mathematical foundation for understanding why large models like GPT-4 and Claude 3 achieve their remarkable efficiency.
The research reveals an exponential gap between merely representing features (which can require as little as O(log m') neurons via the Johnson-Lindenstrauss Lemma) and actually computing with them in superposition. The team also provides nearly tight constructive upper bounds, demonstrating that operations like pairwise AND can be computed using O(√m' log m') neurons and O(m' log² m') parameters. This work has immediate implications for model compression and distillation efforts, establishing explicit limits on how much models can be sparsified while preserving expressibility. The findings suggest that parameter count serves as a reliable estimator for the number of features a network actually computes, providing theoretical grounding for empirical scaling laws observed in practice.
- Establishes first lower bounds: computing m' features requires Ω(√m' log m') neurons and Ω(m' log m') parameters
- Reveals exponential gap: representing features needs O(log m') neurons, but computing them requires exponentially more
- Proves capacity limit: network with n neurons can compute at most O(n²/log n) features
Why It Matters
Provides theoretical foundation for model efficiency limits, impacting compression strategies and explaining why large models work.