Introduced 'differential circuit vulnerability' to measure head-level circuit degradation during fine-tuning?

Introduced 'differential circuit vulnerability' to measure head-level circuit degradation during fine-tuning.

On Qwen2.5-3B-Instruct, SFT caused substantially greater circuit disruption and forgetting compared to RL?

On Qwen2.5-3B-Instruct, SFT caused substantially greater circuit disruption and forgetting compared to RL.

RL preserved more base circuits but adapted slower to the target scientific QA task?

RL preserved more base circuits but adapted slower to the target scientific QA task.

Research & Papers

New study reveals why RL beats SFT for preserving LLM capabilities during fine-tuning

arXiv cs.LG May 29, 2026

⚡Qwen2.5-3B model shows RL preserves internal circuits 2x better than SFT on scientific QA.

Deep Dive

A new arXiv paper from Jeanmely Rojas Nunez and colleagues investigates the mechanistic roots of catastrophic forgetting in LLM fine-tuning. While prior work noted that RL retains capabilities better than SFT, this study dives into why—by tracking how internal computational circuits change. They propose 'differential circuit vulnerability,' a head-level metric that quantifies circuit degradation during fine-tuning. Using Qwen2.5-3B-Instruct (Alibaba's 3B-parameter model) adapted for scientific question-answering, they compared RL (policy-gradient updates) against standard SFT.

The findings reveal a clear trade-off: SFT rapidly adapts to the target task but causes significant circuit disruption, leading to forgetting of previously learned capabilities. In contrast, RL preserves a much larger fraction of the base model's circuits, resulting in less catastrophic forgetting—though at the expense of slower task adaptation. The authors argue that circuit preservation is a key mechanistic reason for RL's robustness. Their code is open-sourced, offering practitioners a way to measure and potentially mitigate forgetting in their own fine-tuning pipelines.

Key Points

Introduced 'differential circuit vulnerability' to measure head-level circuit degradation during fine-tuning.
On Qwen2.5-3B-Instruct, SFT caused substantially greater circuit disruption and forgetting compared to RL.
RL preserved more base circuits but adapted slower to the target scientific QA task.

Why It Matters

Guides model developers to choose RL over SFT when retaining prior LLM capabilities is critical.

Read Original Article

New study reveals why RL beats SFT for preserving LLM capabilities during fine-tuning

Why It Matters

Related Articles

🚀 Stay Ahead in AI