Subspace Geometry Governs Catastrophic Forgetting in Low-Rank Adaptation
New research shows forgetting in AI fine-tuning follows a simple geometric law with 0.994 correlation in synthetic tests.
Researcher Brady Steele has published groundbreaking work explaining catastrophic forgetting in Low-Rank Adaptation (LoRA), the popular parameter-efficient fine-tuning method used with models like GPT and Llama. The paper introduces a geometric theory showing that forgetting follows a specific mathematical law: F = α(1 - cos²θ_min) + β, where θ_min represents the minimum principal angle between task gradient subspaces. This formulation reveals that forgetting becomes largely independent of adapter rank at high subspace angles, with coefficient of variation as low as 0.8% in synthetic settings and 10-19% on real benchmarks like Split-CIFAR100 and sequential GLUE tasks.
The research demonstrates remarkable 0.994 correlation between the geometric law and actual forgetting in synthetic tasks, validating the theory across multiple real-world applications including ViT-LoRA and RoBERTa-LoRA implementations. Crucially, the work reconciles contradictory findings in the literature by showing that rank only affects forgetting when task subspaces are similar (low angle), while orthogonal methods like O-LoRA provide minimal benefit when natural orthogonality is already high. These insights provide principled guidance for practitioners implementing continual learning systems, potentially saving significant computational resources by optimizing adapter configurations based on subspace geometry rather than trial-and-error approaches.
- Catastrophic forgetting in LoRA follows geometric law F = α(1 - cos²θ_min) + β with 0.994 correlation in synthetic tests
- Forgetting shows approximate rank-invariance at high subspace angles (CV ≈ 0.8% synthetic, 10-19% real benchmarks)
- Rank only affects forgetting when task subspaces are similar, explaining contradictory findings in literature
Why It Matters
Provides mathematical foundation for optimizing continual learning systems, potentially saving significant computational resources in AI fine-tuning.