Developer Tools

An LLM-driven Scenario Generation Pipeline Using an Extended Scenic DSL for Autonomous Driving Safety Validation

A new AI system uses GPT-4o mini to turn real crash reports into executable simulations with 97-100% accuracy.

Deep Dive

A team of researchers has developed a novel AI pipeline that automatically transforms real-world crash reports into executable simulation scenarios for validating autonomous driving systems (ADS). The system, detailed in a new arXiv paper, addresses a critical bottleneck in safety testing: the inability to efficiently scale scenario generation from multimodal data like police reports and crash sketches. By leveraging GPT-4o mini for semantic understanding and a probabilistic Extended Scenic Domain-Specific Language (DSL) as an intermediate layer, the pipeline separates high-level intent from low-level rendering, reducing errors and capturing real-world variability better than previous methods like ScenicNL or LCTGen.

In evaluation, the pipeline demonstrated remarkable accuracy, extracting environmental and road network attributes with 100% correctness and actor trajectories with 97-98% accuracy compared to human-derived ground truth from the NHTSA CIREN database. The generated scenarios were executed in the CARLA simulator using the Autoware driving stack, where they consistently triggered intended traffic-rule violations—such as opposite-lane crossing and red-light running—across 2,000 scenario variations. This work provides a legally grounded, scalable, and verifiable approach to ADS safety validation, moving beyond deterministic rule-based systems and enabling the systematic testing of edge cases derived from actual accident data.

Key Points
  • Uses GPT-4o mini to parse crash reports into a probabilistic Extended Scenic DSL, achieving 97-100% extraction accuracy.
  • Generates 2,000+ executable scenario variations in CARLA simulator that reliably trigger specific traffic violations.
  • Introduces a verifiable intermediate representation layer, improving on direct text-to-scenario methods like ScenicNL.

Why It Matters

Enables scalable, data-driven safety validation for self-driving cars by automating test creation from real accident reports.