From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition
A new data-free method explains CLIP's vision transformer heads and enables precise model edits.
A research team led by Francesco Gentile has developed SITH (Semantic Inspection of Transformer Heads), a groundbreaking framework for interpreting the internal mechanisms of vision-language models like OpenAI's CLIP. Unlike existing methods that rely on analyzing model activations with specific datasets—making them vulnerable to data bias—SITH works directly in the weight space without any training data. For each attention head in CLIP's vision transformer, the method decomposes its value-output matrix using Singular Value Decomposition (SVD) and interprets the resulting singular vectors via a new algorithm called COMP (Coherent Orthogonal Matching Pursuit). This algorithm explains each vector as a sparse, semantically coherent combination of human-interpretable concepts.
The research, accepted at CVPR 2026, demonstrates that SITH provides coherent and faithful explanations validated through reconstruction fidelity tests. More practically, this interpretability enables precise, interpretable edits to the model's weights. Researchers can now amplify or suppress specific concepts (like 'redness' or 'roundness') to improve downstream task performance without the computational cost of full model retraining. Furthermore, the team used SITH to study how CLIP adapts during fine-tuning, revealing that the process primarily reweights a stable semantic basis of existing features rather than learning entirely new ones. This challenges previous assumptions about model adaptation and provides a clearer map of how these complex systems function internally.
- SITH provides data-free interpretability by analyzing CLIP's weights directly via Singular Value Decomposition (SVD), eliminating dataset bias.
- The new COMP algorithm explains weight vectors as sparse combinations of human concepts, enabling understanding of individual attention heads.
- The framework allows for precise model edits to amplify/suppress concepts and reveals fine-tuning primarily reweights existing features.
Why It Matters
This moves AI interpretability beyond dataset-dependent methods, enabling safer, more controllable model editing and clearer understanding of adaptation.