Research & Papers

Path-Lock Expert: Separating Reasoning Mode in Hybrid Thinking via Architecture-Level Separation

New architecture slashes reflective tokens from 2.54 to 0.39 on AIME24

Deep Dive

Hybrid-thinking language models such as Qwen3 and DeepSeek-R1 offer explicit think and no-think modes, but current implementations suffer from reasoning leakage: even in no-think mode, models produce long, self-reflective responses. Existing mitigations rely on better data curation and multi-stage training, yet leakage persists because both modes share the same feed-forward (FFN) parameters. In a new paper, Shouren Wang and colleagues from several institutions propose Path-Lock Expert (PLE), an architectural redesign that cleanly separates reasoning modes at the layer level.

PLE replaces the single MLP in each decoder layer with two dedicated experts: one for think and one for no-think. A deterministic control-token router selects exactly one expert path for the entire sequence, preserving the dense model's per-token computation pattern while ensuring each expert receives mode-pure updates during supervised fine-tuning. On Qwen3-4B, PLE reduced no-think reflective tokens on AIME24 from 2.54 to 0.39 and improved no-think accuracy from 20.67% to 40.00%, all without degrading think-mode performance. The authors argue that controllable hybrid thinking is fundamentally an architectural problem, and that separating mode-specific feed-forward pathways is a simple, effective solution that eliminates the need for complex data engineering.

Key Points
  • PLE replaces each decoder's MLP with two separate experts (think/no-think) to prevent reasoning leakage at the architecture level.
  • On Qwen3-4B, no-think reflective tokens on AIME24 dropped from 2.54 to 0.39, while accuracy rose from 20.67% to 40.00%.
  • The deterministic control-token router keeps inference computation patterns identical to dense models and requires no special data curation.

Why It Matters

Architecture-level mode separation enables more accurate, concise LLM responses without costly data engineering or multi-stage training.