Zero-Ablation Overstates Register Content Dependence in DINO Vision Transformers
Zero-ablation overstated register importance by 30+ percentage points; simple replacements preserve performance.
A new study by researchers Felipe Parodi, Jordan Matelsky, and Melanie Segado challenges a fundamental assumption in AI interpretability. The paper, accepted to the CVPR 2026 HOW Vision Interpretability Workshop, demonstrates that the widely used technique of zero-ablation—replacing token activations with zero vectors—dramatically overstates how much Meta's DINOv2 and DINOv3 Vision Transformers depend on the exact content of their 'register' tokens. Zeroing these registers caused large performance drops of up to 36.6 percentage points in classification and 30.9 pp in segmentation, suggesting they were indispensable.
However, the researchers introduced three controlled replacement methods: mean-substitution, noise-substitution, and cross-image register-shuffling. All three preserved model performance across classification, correspondence, and segmentation tasks, remaining within approximately 1 percentage point of the unmodified baseline. Analysis showed these replacements genuinely perturbed the model's internal representations, but zeroing caused disproportionately large disruptions. The conclusion is that, for frozen-feature evaluations, performance depends on having plausible register-like activations rather than the exact, image-specific values. Registers still serve a functional role in buffering dense features and compressing patch geometry, but their necessity has been overstated. These findings were replicated at the ViT-B scale, confirming the robustness of the result.
- Zero-ablation on DINOv2/DINOv3 registers caused drops up to 36.6 pp, suggesting critical dependence.
- Three replacement controls (mean/noise/shuffle) preserved performance within ~1 pp, proving exact data isn't needed.
- The finding challenges a core interpretability method and shows models need plausible activations, not exact values.
Why It Matters
For AI engineers, this means common interpretability techniques may be misleading, requiring more rigorous controls to understand model behavior.