Research & Papers

OccSim: Multi-kilometer Simulation with Long-horizon Occupancy World Models

The new simulator creates 3,000+ continuous frames without HD maps, an 80x improvement in generation length.

Deep Dive

A research team including Tianran Liu, Shengwen Zhao, and Nicholas Rhinehart has introduced OccSim, a breakthrough in data-driven autonomous driving simulation. The system fundamentally breaks from previous approaches that were constrained by pre-recorded driving logs or HD maps. Instead, OccSim uses an occupancy world model to generate massive, diverse simulation streams from just a single initial frame and a sequence of future ego-actions. This enables the continuous construction of large-scale 3D occupancy maps spanning over 4 kilometers, representing an unprecedented >80x improvement in stable generation length over previous state-of-the-art models.

The architecture is powered by two key modules: a W-DiT based static occupancy world model and a Layout Generator. The W-DiT handles ultra-long-horizon generation of static environments by explicitly incorporating known rigid transformations, while the Layout Generator populates dynamic foregrounds with reactive agents based on synthesized road topology. This design allows OccSim to synthesize realistic, scalable simulation data that proves highly effective for downstream tasks.

Extensive experiments demonstrate OccSim's practical utility. Data collected directly from the simulator can pre-train 4D semantic occupancy forecasting models to achieve up to 67% zero-shot performance on unseen real-world data, outperforming previous asset-based simulators by 11%. When scaling the OccSim-generated dataset to 5x its original size, zero-shot performance increases to about 74%, while the improvement over traditional asset-based simulators expands to 22.1%. This represents a significant leap toward scalable, data-efficient training for autonomous systems.

Key Points
  • Generates 3,000+ continuous frames and 4km+ 3D occupancy maps from a single initial frame, an 80x improvement in stable generation length.
  • Uses a novel W-DiT architecture for static environments and a Layout Generator for dynamic agents, eliminating dependency on HD maps.
  • Enables 74% zero-shot performance for pre-trained 4D occupancy models, beating asset-based simulators by 22.1% when scaled.

Why It Matters

Enables scalable, cost-effective training of autonomous vehicles without expensive real-world data collection or manual HD map creation.