Research & Papers

How Transformers Learn to Plan via Multi-Token Prediction

arXiv cs.LG April 15, 2026

⚡A novel training objective outperforms standard methods on path-finding and logic puzzles by enabling reverse reasoning.

Deep Dive

A team of researchers, including Jianhao Huang, Zhanpeng Zhou, and Baharan Mirzasoleiman, has published a groundbreaking paper analyzing the 'Multi-Token Prediction' (MTP) training objective for Transformer models. While the standard 'Next-Token Prediction' (NTP) method struggles with tasks requiring global structure and planning, the study shows MTP enables models to excel. Empirically, MTP consistently outperformed NTP on synthetic graph path-finding tasks and realistic reasoning benchmarks like Countdown and boolean satisfiability problems, demonstrating its superior capability for complex reasoning.

The core theoretical breakthrough explains *how* MTP works. By analyzing a simplified two-layer Transformer on a star graph task, the researchers proved that MTP induces a distinct, two-stage reverse reasoning process. The model first attends to the end goal or node, then reconstructs the solution path by tracing intermediate steps backward. This behavior stems from a 'gradient decoupling' property inherent to MTP, which provides a cleaner, more direct training signal compared to the entangled gradients of NTP. Ultimately, the research highlights that multi-token objectives inherently bias model optimization toward building more robust, interpretable, and plan-capable reasoning circuits within the neural network architecture.

Key Points

MTP outperformed standard NTP on path-finding and logic benchmarks like Countdown.
The method induces a proven two-stage 'reverse reasoning' process, starting from the goal.
A 'gradient decoupling' property provides a cleaner training signal for building planning circuits.

Why It Matters

This research could lead to more capable AI agents that can plan complex multi-step tasks, from coding to scientific discovery.

Read Original Article

How Transformers Learn to Plan via Multi-Token Prediction

Why It Matters

Stay Ahead in AI