$M^2$-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera Inputs
New AI model maintains 3D scene understanding even when 5 out of 6 car cameras fail simultaneously.
A research team led by Kaixin Lin and Kailun Yang has introduced M²-Occ, a breakthrough framework for 3D semantic occupancy prediction that maintains robust performance even when autonomous vehicle cameras fail. Unlike existing systems that assume perfect 360-degree camera coverage, M²-Occ addresses the real-world problem of incomplete inputs caused by occlusion, hardware malfunctions, or communication failures. The framework introduces two innovative components: a Multi-view Masked Reconstruction (MMR) module that leverages spatial overlap between neighboring cameras to reconstruct missing views directly in feature space, and a Feature Memory Module (FMM) that stores class-level semantic prototypes to refine ambiguous voxel features.
Tested on the nuScenes-based SurroundOcc benchmark with a systematic missing-view evaluation protocol, M²-Occ demonstrated significant improvements in safety-critical scenarios. When the crucial back-view camera was missing, the system improved Intersection-over-Union (IoU) by 4.93%. As camera failures increased, the robustness advantage grew substantially—with five missing views (out of six typical cameras), M²-Occ boosted IoU by 5.01%. These gains were achieved without sacrificing performance in full-view scenarios, making the system practical for real-world deployment where camera reliability cannot be guaranteed.
The research represents a crucial step toward more resilient autonomous driving systems, addressing a fundamental weakness in current perception stacks. By enabling vehicles to maintain accurate 3D semantic understanding even with partial sensor input, M²-Occ could significantly improve safety in edge cases where traditional systems might fail. The team plans to release the source code publicly, potentially accelerating adoption across the autonomous vehicle industry.
- Improves IoU by 4.93% in critical missing back-view scenarios and 5.01% with five missing cameras
- Uses Multi-view Masked Reconstruction to recover missing views and Feature Memory Module for semantic consistency
- Maintains full-view performance while adding robustness to camera failures from occlusion or hardware issues
Why It Matters
Makes autonomous vehicles safer in real-world conditions where camera failures are inevitable, addressing a critical reliability gap.