[R] Predicting Edge Importance in GPT-2's Induction Circuit from Weights Alone (ρ=0.623, 125x speedup)
Researchers achieve 0.623 correlation predicting edge importance from weights alone, eliminating costly forward passes.
Independent researcher developed the 'Cheap Anchor' scoring method that predicts which edges in GPT-2's induction circuit matter most using only weight structure. It achieves Spearman ρ=0.623 correlation with ground truth path patching results while being 125x faster. The method analyzes spectral concentration and downstream path weight in virtual matrices, allowing researchers to prioritize circuit investigation without running expensive ablation studies or forward passes through the model.
Why It Matters
Could dramatically accelerate mechanistic interpretability research by identifying important model components before running costly experiments.