AI Safety

Computation in Superposition: Two Handcrafted Models

LessWrong AI April 30, 2026

⚡Networks may mix superposition with clever encodings that sidestep it entirely.

Deep Dive

In a new LessWrong post, researchers RGRGRG and Kyle Ray explore how neural networks perform computation in superposition—encoding more facts than they have components—using a toy task: recognizing valid first-name/last-name pairs of 8 famous athletes. They handcraft a first network with 6 neurons, each tuned to fire for 4 athletes via additive weights and a bias of -1, such that each athlete activates exactly 3 neurons. This design uses superposition to combine partial evidence, but false positives (e.g., "Peyton Ruth") require multiple neurons to vote for correct classification.

However, trained networks on the same task often mix this superposition mechanism with a different strategy: a second handcrafted network uses just 2 neurons to memorize arbitrary name pairs without superposition, employing a clever encoding that sidesteps the need for distributed representation. The contrast highlights that superposition is not the only algorithm networks use, even in constrained settings. Understanding both mechanisms provides a sharper vocabulary for identifying how models use knowledge, crucial for safety in larger, more capable systems.

Key Points

First handcrafted network uses 6 neurons with additive superposition to recognize 4 athletes each, requiring 3 neurons to fire for correct classification.
Second network uses just 2 neurons to memorize arbitrary name pairs, avoiding superposition entirely via a different encoding strategy.
Trained networks often mix both approaches, complicating interpretability and suggesting superposition is not the only algorithm for dense knowledge storage.

Why It Matters

This work clarifies how networks compute in superposition, aiding interpretability for safer AI systems.

Read Original Article

Computation in Superposition: Two Handcrafted Models

Why It Matters

Stay Ahead in AI