Research & Papers

Disentangling Direction and Magnitude in Transformer Representations: A Double Dissociation Through L2-Matched Perturbation Analysis

Scientists just discovered how to 'hack' a transformer's brain in two different ways.

Deep Dive

A new paper reveals that the direction and magnitude of vectors in transformer models serve distinct computational roles. Angular perturbations cause up to 42.9x more damage to language modeling, while magnitude changes disproportionately hurt syntactic processing (20.4% vs. 1.6% accuracy drop). The study used a novel L2-matched perturbation method on Pythia-family models, showing damage flows through different pathways (attention vs. LayerNorm). This dissociation refines the linear representation hypothesis.

Why It Matters

This breakthrough could lead to more precise model editing and new interpretability tools for developers.