Image & Video

OccTrack360: 4D Panoptic Occupancy Tracking from Surround-View Fisheye Cameras

Researchers unveil a new benchmark and AI model to track 3D objects in distorted fisheye video for autonomous driving.

Deep Dive

A research team led by Yongzhi Lin, Kai Luo, and Kailun Yang has introduced OccTrack360, a significant new benchmark for 4D panoptic occupancy tracking. This technology is crucial for autonomous vehicles and robotics, as it aims to understand dynamic 3D environments in a spatially continuous and temporally consistent way. The benchmark addresses a major gap by supporting surround-view fisheye sensing, long temporal sequences (ranging from 174 to 2,234 frames), and instance-level voxel tracking. It includes principled annotations like an all-direction occlusion mask and a fisheye field-of-view mask, providing a much-needed dataset for training and evaluating AI models in real-world, wide-angle scenarios.

To establish a strong baseline for this new benchmark, the team also developed FoSOcc (Focus on Sphere Occ), a novel AI framework designed to tackle the core challenges of fisheye occupancy tracking. Fisheye cameras create distorted spherical projections that confuse standard computer vision models. FoSOcc combats this with two key components: a Center Focusing Module (CFM) that enhances instance-aware spatial localization through supervised focus guidance, and a Spherical Lift Module (SLM) that extends perspective lifting techniques to work under the Unified Projection Model for fisheye images.

Extensive testing on both the new OccTrack360 benchmark and the existing Occ3D-Waymo dataset shows that the FoSOcc method improves occupancy tracking quality, with notable performance gains on geometrically regular objects. By publicly releasing the benchmark dataset and source code, the researchers are providing the tools needed to advance the state of the art in 4D perception for autonomous systems that rely on cost-effective, wide-field-of-view fisheye cameras.

Key Points
  • New OccTrack360 benchmark provides long sequences (up to 2,234 frames) with voxel visibility annotations for fisheye cameras.
  • FoSOcc framework introduces a Center Focusing Module and Spherical Lift Module to handle fisheye distortion and improve 3D localization.
  • The public release of code and data aims to accelerate research in 4D perception for autonomous driving and robotics.

Why It Matters

Enables more accurate and affordable 3D environment perception for autonomous vehicles using wide-angle fisheye cameras instead of expensive LiDAR.