Robotics

Uber's new system turns language into driving scenarios for AV testing

Describe a traffic scene in words, and it becomes a simulation test instantly.

Deep Dive

Autonomous vehicle testing requires millions of simulated driving scenarios, but manually programming them is slow and statistical models struggle with out-of-distribution precision. Researchers from Uber (now part of Aurora) propose a new approach: cast scenario orchestration as a constraint satisfaction problem. Their system takes a natural language description (e.g., "a car cuts in aggressively from the left lane while a pedestrian jaywalks"), uses a foundation model to parse it into a set of constraints, then employs off-the-shelf solvers to generate actor behaviors that meet those constraints in closed-loop simulation. The paper, accepted at ICRA 2026, demonstrates that this method greatly surpasses baselines in orchestration success rate across diverse, carefully crafted scenarios.

The key innovation lies in combining the flexibility of language with the deterministic control of constraint solving. Unlike end-to-end generative models, which can hallucinate or miss precise specifications, this approach guarantees that the generated scenario matches the tester's intent—especially critical for ego-reactive scenarios where the AV's own policy must interact realistically with other agents. The system uses foundation model reasoning only to translate language into formal constraints, not to generate behaviors, keeping the simulation trustworthy. This hybrid approach could dramatically accelerate AV validation by letting engineers describe edge cases in plain English and instantly run them in simulation.

Key Points
  • Converts natural language driving descriptions into formal constraints using foundation model reasoning.
  • Casts scenario orchestration as a constraint satisfaction problem solved by off-the-shelf solvers.
  • Outperforms baselines in orchestration success rate, particularly for ego-reactive closed-loop testing.

Why It Matters

Enables effortless creation of complex driving scenarios for safer and more scalable AV testing.