How Transformers Learn to Plan via Multi-Token Prediction
A novel training objective outperforms standard methods on path-finding and logic puzzles by enabling reverse reasoning.
A team of researchers, including Jianhao Huang, Zhanpeng Zhou, and Baharan Mirzasoleiman, has published a groundbreaking paper analyzing the 'Multi-Token Prediction' (MTP) training objective for Transformer models. While the standard 'Next-Token Prediction' (NTP) method struggles with tasks requiring global structure and planning, the study shows MTP enables models to excel. Empirically, MTP consistently outperformed NTP on synthetic graph path-finding tasks and realistic reasoning benchmarks like Countdown and boolean satisfiability problems, demonstrating its superior capability for complex reasoning.
The core theoretical breakthrough explains *how* MTP works. By analyzing a simplified two-layer Transformer on a star graph task, the researchers proved that MTP induces a distinct, two-stage reverse reasoning process. The model first attends to the end goal or node, then reconstructs the solution path by tracing intermediate steps backward. This behavior stems from a 'gradient decoupling' property inherent to MTP, which provides a cleaner, more direct training signal compared to the entangled gradients of NTP. Ultimately, the research highlights that multi-token objectives inherently bias model optimization toward building more robust, interpretable, and plan-capable reasoning circuits within the neural network architecture.
- MTP outperformed standard NTP on path-finding and logic benchmarks like Countdown.
- The method induces a proven two-stage 'reverse reasoning' process, starting from the goal.
- A 'gradient decoupling' property provides a cleaner training signal for building planning circuits.
Why It Matters
This research could lead to more capable AI agents that can plan complex multi-step tasks, from coding to scientific discovery.