Off-model SFT degrades reasoning tasks (MATH-500, Olympiads) by up to 30% but leaves simple tasks (MMLU) unaffected?

Off-model SFT degrades reasoning tasks (MATH-500, Olympiads) by up to 30% but leaves simple tasks (MMLU) unaffected.

Degradation occurs even when the teacher (e.g., Claude Opus 4.7) is stronger than the student, ruling out dumb-teacher hypothesis?

Degradation occurs even when the teacher (e.g., Claude Opus 4.7) is stronger than the student, ruling out dumb-teacher hypothesis.

Performance can be recovered by training on unrelated data, indicating the new reasoning style is shallow and reversible?

Performance can be recovered by training on unrelated data, indicating the new reasoning style is shallow and reversible.

AI Safety

LessWrong Study: Off-Model SFT Hurts AI Reasoning - But Fix Is Quick

LessWrong AI May 21, 2026

⚡Training models on other AI's outputs drops reasoning by 30% on hard problems.

Deep Dive

Researchers from LessWrong (SebastianP et al.) found that off-model supervised fine-tuning (SFT) degrades AI reasoning performance—especially on problems requiring long chains of reasoning like MATH-500 and Olympiads—even when learning from a stronger teacher model like Claude Opus 4.7 or GPT-5.5. The cause appears to be an unfamiliar reasoning style, not teacher quality or perplexity. Crucially, a small amount of training on unrelated data restores performance, and the effect is context-specific: degradation only appears in certain prompt contexts, not others.

Key Points

Off-model SFT degrades reasoning tasks (MATH-500, Olympiads) by up to 30% but leaves simple tasks (MMLU) unaffected.
Degradation occurs even when the teacher (e.g., Claude Opus 4.7) is stronger than the student, ruling out dumb-teacher hypothesis.
Performance can be recovered by training on unrelated data, indicating the new reasoning style is shallow and reversible.

Why It Matters

Off-model SFT is vital for AI alignment; this research shows how to prevent capability loss in controlled models.

Read Original Article

LessWrong Study: Off-Model SFT Hurts AI Reasoning - But Fix Is Quick

Why It Matters

Related Articles

🚀 Stay Ahead in AI