Identifying Interactions at Scale for LLMs
New framework identifies why LLMs make decisions with 90% fewer computational tests.
Anthropic researchers have unveiled SPEX (Spectral Explainer) and ProxySPEX, two breakthrough algorithms designed to tackle one of AI's toughest challenges: understanding why large language models make specific decisions. Traditional interpretability methods struggle with 'complexity at scale'—the exponential growth of potential interactions between features, training data, and internal components as models grow larger. SPEX circumvents this by exploiting two key structural properties: sparsity (few interactions truly matter) and low-degreeness (important interactions involve small feature subsets).
The framework uses strategic ablation—systematically removing components and measuring output changes—but does so with orders of magnitude fewer tests than exhaustive approaches. By framing the problem as a sparse recovery challenge, SPEX applies signal processing and coding theory techniques to disentangle combined signals and isolate influential interactions. The subsequent ProxySPEX algorithm adds hierarchy detection, recognizing that when higher-order interactions matter, their lower-order subsets likely do too. This additional insight delivers a dramatic efficiency gain: ProxySPEX matches SPEX's performance with approximately 10x fewer ablations.
These advances make previously intractable analyses feasible, allowing researchers to systematically trace model behaviors back to specific training examples, prompt features, or internal mechanisms. For the first time, teams can efficiently audit why models like Claude 3.5 generate particular outputs, identify training data influences, and validate safety mechanisms at scale—all with manageable computational costs.
- SPEX identifies influential interactions in LLMs using strategic ablation and sparse recovery techniques from signal processing
- ProxySPEX adds hierarchy detection to achieve similar results with 10x fewer computational ablations than SPEX
- Enables practical analysis of feature, training data, and model component interactions at scale for models like Claude 3.5
Why It Matters
Makes AI decision-making transparent for safety audits and debugging without prohibitive computational costs.