Robotics

Beyond Scalar Rewards: Distributional Reinforcement Learning with Preordered Objectives for Safe and Reliable Autonomous Driving

A new AI framework for self-driving cars prioritizes safety over speed, cutting collisions in Carla simulations.

Deep Dive

A research team from institutions including the Karlsruhe Institute of Technology has published a paper proposing a fundamental shift in how reinforcement learning (RL) agents for autonomous driving are trained. The core problem they address is the standard practice of combining multiple objectives—like safety, efficiency, and comfort—into a single scalar reward via weighted summation. This often leads to policies that sacrifice critical safety constraints to optimize for a higher overall score. Their solution is the Preordered Multi-Objective Markov Decision Process (Pr-MOMDP), which explicitly defines a hierarchy, or preorder, among reward components, ensuring safety is prioritized over other goals.

To make this preorder actionable, the researchers developed a novel algorithm based on distributional RL. Instead of comparing the expected value of actions, their method uses a new pairwise comparison metric called Quantile Dominance (QD) to evaluate the full distribution of potential returns for each objective. This allows the AI to identify a subset of optimal actions that are not dominated by others across all prioritized goals. The framework was implemented using Implicit Quantile Networks (IQN) and tested in the high-fidelity Carla driving simulator.

The results demonstrated a tangible improvement over existing methods. Compared to standard IQN and ensemble-IQN baselines, policies trained with the Pr-MOMDP framework achieved higher task success rates while statistically significantly reducing critical failures like collisions and off-road events. By ensuring the AI's decision-making respects a predefined priority of objectives, this work represents a concrete step toward more robust and trustworthy autonomous driving systems, where safety is not negotiable for marginal gains in speed or comfort.

Key Points
  • Introduces Pr-MOMDP, a framework that uses a preorder (hierarchy) over objectives like safety and efficiency, moving beyond scalar reward sums.
  • Proposes a novel Quantile Dominance (QD) metric within distributional RL to compare action outcomes without reducing them to a single statistic.
  • Tested in Carla, the method delivered policies with higher success rates and fewer collisions than state-of-the-art IQN baselines.

Why It Matters

Provides a formal AI framework to hard-code safety as the top priority in self-driving systems, making them more reliable.