Research & Papers

Matched-Learning-Rate Analysis of Attention Drift and Transfer Retention in Fine-Tuned CLIP

Matched learning rate analysis reveals LoRA maintains 45% vs 11% zero-shot accuracy on CLIP models.

Deep Dive

A new machine learning study by researcher Ruize Xia provides the first controlled, matched-learning-rate analysis comparing two popular adaptation methods for CLIP models: Full Fine-Tuning (Full FT) and Low-Rank Adaptation (LoRA). The research, involving 80 experimental runs on the CLIP ViT-B/32 architecture across EuroSAT and Oxford-IIIT Pets datasets, reveals that previous comparisons were confounded by different learning-rate conventions. When learning rates are properly matched, LoRA demonstrates dramatically better preservation of the model's original capabilities.

The key finding shows LoRA maintains substantially more zero-shot transfer ability than Full FT, averaging 45.13% versus just 11.28% CIFAR-100 accuracy on EuroSAT tasks, and 58.01% versus 8.54% on Pets. The study also introduces attention drift metrics as diagnostic tools, showing Full FT causes marked contraction in attention patterns at higher learning rates while LoRA remains more stable. This research fundamentally changes how practitioners should interpret the trade-offs between adaptation methods, with LoRA emerging as the superior choice for maintaining a model's foundational knowledge while adapting it to new tasks.

Key Points
  • LoRA preserves 4x more zero-shot transfer than Full FT (45% vs 11% accuracy on EuroSAT)
  • Study controlled for learning rate across 80 runs on CLIP ViT-B/32
  • Attention drift metrics show Full FT causes contraction while LoRA maintains stability

Why It Matters

Practitioners can now make informed choices about model adaptation, preserving valuable pre-trained knowledge while specializing models.