RPSFT: New fine-tuning method preserves pretrained knowledge while improving OOD generalization
Efficiently prevents catastrophic forgetting by locking singular vectors without Hessian or Fisher computing.
A team of researchers has introduced Rotation-Preserving Supervised Fine-Tuning (RPSFT), a method that maintains a model's pretrained knowledge during fine-tuning without sacrificing task adaptation. Traditional SFT often degrades out-of-domain (OOD) performance by rotating weight matrices in sensitive directions. Prior work identified Hessian or Fisher information as the correct metric, but computing these for LLMs is prohibitively expensive. RPSFT offers a computationally efficient proxy: it penalizes changes to the projected top-k singular-vector block of each pretrained weight matrix, effectively locking the most important rotations while allowing adaptation in other subspaces.
Evaluated on math reasoning data across various model families and sizes (including GPT-style architectures), RPSFT consistently outperforms standard SFT and strong baselines on both in-domain accuracy and OOD generalization. The method also produces representations closer to the pretrained state and serves as a better initialization for subsequent RL fine-tuning. The code is publicly available, making it easy for practitioners to adopt. This work addresses a core tension in fine-tuning and could extend to multimodal models or instruction tuning settings.
- RPSFT penalizes changes in top-k singular vectors of each weight matrix as a proxy for Fisher-sensitive directions, avoiding costly Hessian computations.
- On math reasoning tasks, RPSFT improves the in-domain/OOD trade-off across multiple model families and sizes compared to standard SFT and baselines.
- RPSFT better preserves pretrained representations and provides stronger initializations for downstream reinforcement learning fine-tuning.
Why It Matters
Enables LLMs to learn new tasks without losing general knowledge, reducing the need for large OOD datasets.