Research & Papers

[R] Predicting Edge Importance in GPT-2's Induction Circuit from Weights Alone (ρ=0.623, 125x speedup)

Researchers achieve 0.623 correlation predicting edge importance from weights alone, eliminating costly forward passes.

Deep Dive

Independent researcher developed the 'Cheap Anchor' scoring method that predicts which edges in GPT-2's induction circuit matter most using only weight structure. It achieves Spearman ρ=0.623 correlation with ground truth path patching results while being 125x faster. The method analyzes spectral concentration and downstream path weight in virtual matrices, allowing researchers to prioritize circuit investigation without running expensive ablation studies or forward passes through the model.

Why It Matters

Could dramatically accelerate mechanistic interpretability research by identifying important model components before running costly experiments.