Robotics

TaCarla: A comprehensive benchmarking dataset for end-to-end autonomous driving

Researchers release massive 2.85M-frame simulation dataset to solve autonomous driving's 'long-tail' problem.

Deep Dive

A research team led by Tuğrul Görgülü has introduced TaCarla, a major new dataset designed to accelerate end-to-end autonomous driving AI development. Published on arXiv, the dataset directly addresses critical shortcomings in existing autonomous vehicle training data, where perception datasets often lack planning data and vice-versa. TaCarla is built on the CARLA simulation platform specifically for the diverse scenarios of the CARLA Leaderboard 2.0 challenge, which is engineered to tackle the 'long-tail' problem—those rare but critical edge-case driving situations that existing models struggle with. The team argues that current datasets are either too narrow in sensor configuration or lack the behavioral diversity needed for robust real-world performance.

The dataset comprises over 2.85 million frames and is uniquely comprehensive, supporting not just planning tasks but also dynamic object detection, lane and centerline detection, traffic light recognition, prediction tasks, and visual language action models. A key innovation is the inclusion of 'numerical rarity scores' that help researchers understand how unusual a given driving state is within the dataset, allowing for better analysis of model performance on edge cases. By providing a unified resource for both perception and planning within a closed-loop evaluation framework, TaCarla aims to become a standard benchmark, enabling more direct comparison between different AI approaches and accelerating progress toward reliable, real-world autonomous driving systems that can handle complex, unpredictable scenarios.

Key Points
  • Contains over 2.85 million frames from the CARLA Leaderboard 2.0 simulation environment
  • Unifies perception (detection, recognition) and planning data in a single closed-loop evaluation dataset
  • Introduces 'rarity scores' to quantify how unusual specific driving states are, targeting the long-tail problem

Why It Matters

Provides a unified, large-scale benchmark to train and test more robust, end-to-end autonomous driving AI systems.