Robotics

HAD: Combining Hierarchical Diffusion with Metric-Decoupled RL for End-to-End Driving

The end-to-end planning framework achieves state-of-the-art performance, outperforming prior models by significant margins on key metrics.

Deep Dive

A research team led by Wenhao Yao has introduced HAD, a novel end-to-end planning framework for autonomous driving that addresses key limitations in current diffusion-based and reinforcement learning (RL) approaches. The system tackles the problem of unrealistic trajectory generation in diffusion models and the ineffective optimization of single-reward RL systems. HAD's core innovation is its two-pronged architecture: a Hierarchical Diffusion Policy that breaks planning into coarse-to-fine stages, and a novel training method called Metric-Decoupled Policy Optimization (MDPO).

To generate better driving trajectories, HAD employs a technique called Structure-Preserved Trajectory Expansion. This method produces more realistic candidate paths that adhere to vehicle kinematics, solving the issue where standard Gaussian noise in diffusion models creates physically implausible options. For training, MDPO decouples the complex task of driving into multiple structured objectives (like safety, comfort, and route completion), allowing for more effective and targeted policy optimization than a single, coupled reward signal.

The results are substantial. In extensive testing on major autonomous driving simulators, HAD set new state-of-the-art performance records. It outperformed previous leading models by a significant margin, achieving a +2.3 improvement in the EPDMS metric on the NAVSIM benchmark and a +4.9 point boost in Route Completion on the HUGSIM benchmark. This demonstrates a major leap in the AI's ability to navigate complex environments reliably and efficiently, moving closer to robust real-world deployment.

Key Points
  • Uses a Hierarchical Diffusion Policy for coarse-to-fine trajectory planning, improving over standard diffusion models.
  • Introduces Metric-Decoupled Policy Optimization (MDPO) to train on multiple structured driving objectives instead of a single reward.
  • Achieves state-of-the-art results: +2.3 EPDMS on NAVSIM and +4.9 Route Completion on HUGSIM, outperforming prior models.

Why It Matters

This represents a significant step towards more reliable and scalable AI for real-world autonomous vehicles, improving both planning realism and training efficiency.