CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving
New reinforcement learning planner uses a 'propose, evaluate, correct' loop to fix unsafe actions in real-time.
A team of researchers has introduced CorrectionPlanner, a new AI planning system for autonomous vehicles that addresses a critical flaw in current learning-based models: the lack of explicit self-correction. Unlike standard planners that simply propose an action, CorrectionPlanner operates within a continuous 'propose, evaluate, and correct' loop. At each step, it generates a motion token (a proposed action), and a learned collision critic immediately predicts if it will cause a collision within a short horizon. If unsafe, the system retains that token as part of a 'self-correction trace'—a record of rejected actions—and generates a new, conditioned proposal. This process repeats until a safe action is found, mimicking the reasoning traces used in large language models.
The system is trained in two phases: first with imitation learning from expert data, then refined with model-based reinforcement learning using a pre-trained world model that simulates realistic, reactive behaviors from other agents. This dual training approach allows the planner to learn both from ideal scenarios and from the consequences of its own decisions in a simulated environment. The results are significant: in closed-loop evaluations, CorrectionPlanner reduced collision rates by over 20% on the Waymax simulation benchmark and achieved top-tier planning scores on the challenging nuPlan benchmark. This represents a major step toward more reliable and introspective AI drivers that can proactively avoid dangerous situations rather than just reacting to them.
- Uses a 'propose, evaluate, correct' loop with a collision critic to veto unsafe motion tokens in real-time.
- Reduces collision rates by over 20% in Waymax simulations and sets new state-of-the-art scores on the nuPlan benchmark.
- Trained with imitation learning followed by model-based reinforcement learning using a realistic world model for agent behavior.
Why It Matters
Makes self-driving AI significantly safer by enabling real-time correction of dangerous maneuvers before they are executed.