H\'an D\=an Xu\'e B\`u (Mimicry) or Q\=ing Ch\=u Y\'u L\'an (Mastery)? A Cognitive Perspective on Reasoning Distillation in Large Language Models
Distilled LLMs mimic verbosity, not true reasoning—a functional alignment collapse.
Recent Large Reasoning Models trained via reinforcement learning exhibit a 'natural' alignment with human cognitive costs. However, a new study by Yueqing Hu and colleagues at arXiv (2601.05019) reveals that the prevailing paradigm of reasoning distillation—training student models to mimic these traces via Supervised Fine-Tuning (SFT)—fails to transmit this cognitive structure. Testing the 'Hán Dān Xué Bù' (Superficial Mimicry) hypothesis across 14 models, the researchers found that distillation induces a 'Functional Alignment Collapse': while teacher models mirror human difficulty scaling (r=0.64), distilled students significantly degrade this alignment (r=0.34), often underperforming their own pre-distillation baselines in a phenomenon called 'Negative Transfer.'
Their analysis suggests that SFT induces a 'Cargo Cult' effect, where students ritualistically replicate the linguistic form of reasoning (verbosity) without internalizing the teacher's dynamic resource allocation policy. Consequently, reasoning distillation decouples computational cost from cognitive demand, revealing that human-like cognition is an emergent property of active reinforcement, not passive imitation. This has profound implications for how we train smaller, cheaper models to reason effectively.
- Teacher models show strong alignment with human difficulty scaling (r=0.64), but distilled students drop to r=0.34.
- SFT distillation often causes 'Negative Transfer,' where students underperform their own pre-distillation baselines.
- The 'Cargo Cult' effect means students mimic verbosity, not actual reasoning resource allocation.
Why It Matters
SFT distillation for reasoning is fundamentally flawed; active reinforcement learning may be required for true cognitive transfer.