Computation in Superposition: Two Handcrafted Models
Networks may mix superposition with clever encodings that sidestep it entirely.
In a new LessWrong post, researchers RGRGRG and Kyle Ray explore how neural networks perform computation in superposition—encoding more facts than they have components—using a toy task: recognizing valid first-name/last-name pairs of 8 famous athletes. They handcraft a first network with 6 neurons, each tuned to fire for 4 athletes via additive weights and a bias of -1, such that each athlete activates exactly 3 neurons. This design uses superposition to combine partial evidence, but false positives (e.g., "Peyton Ruth") require multiple neurons to vote for correct classification.
However, trained networks on the same task often mix this superposition mechanism with a different strategy: a second handcrafted network uses just 2 neurons to memorize arbitrary name pairs without superposition, employing a clever encoding that sidesteps the need for distributed representation. The contrast highlights that superposition is not the only algorithm networks use, even in constrained settings. Understanding both mechanisms provides a sharper vocabulary for identifying how models use knowledge, crucial for safety in larger, more capable systems.
- First handcrafted network uses 6 neurons with additive superposition to recognize 4 athletes each, requiring 3 neurons to fire for correct classification.
- Second network uses just 2 neurons to memorize arbitrary name pairs, avoiding superposition entirely via a different encoding strategy.
- Trained networks often mix both approaches, complicating interpretability and suggesting superposition is not the only algorithm for dense knowledge storage.
Why It Matters
This work clarifies how networks compute in superposition, aiding interpretability for safer AI systems.