CoMo3R-SLAM enables multi-robot 3D mapping with just RGB cameras
No depth sensors needed: CoMo3R-SLAM maps outdoor scenes at 8 FPS from monocular video only.
Collaborative dense SLAM is crucial for multi-robot teams to perceive large outdoor environments, but existing systems typically rely on depth sensors that add payload, power, and calibration costs. CoMo3R-SLAM, introduced by Zhihao Cao and colleagues, eliminates these requirements by using only monocular RGB cameras. The system leverages learned feed-forward 3D reconstruction priors to handle scale ambiguity and unreliable inter-agent data association common in outdoor scenes with low overlap and repetitive structures. Each agent runs a prior-guided front-end for real-time tracking and local dense fusion, while a central coordinator performs dense pointmap matching for cross-agent verification, closed-form Sim(3) gauge synchronization, and GPU-accelerated global bundle adjustment with segment-level depth optimization. This design produces globally consistent metric maps from RGB alone, without needing depth sensors or even parametric camera intrinsics.
On the Tanks and Temples benchmark and Waymo outdoor sequences, CoMo3R-SLAM achieves the best Absolute Trajectory Error (ATE) on three out of four Tanks and Temples scenes and competitive accuracy on Waymo, matching or exceeding state-of-the-art RGB-D methods. The system runs online at 8 frames per second, making it practical for real-world deployment. By removing the reliance on depth hardware, CoMo3R-SLAM dramatically reduces the cost and complexity of equipping robot swarms or drone teams with dense 3D perception, opening new possibilities for scalable outdoor multi-agent mapping, search and rescue, and autonomous navigation.
- First collaborative monocular dense SLAM system using learned 3D reconstruction priors for outdoor multi-agent mapping
- No depth sensors or parametric intrinsics required; runs online at 8 FPS on standard hardware
- Matches or exceeds state-of-the-art RGB-D methods on Tanks and Temples (best ATE on 3 of 4 scenes) and Waymo datasets
Why It Matters
Enables lightweight, scalable 3D perception for robot swarms and drones without bulky depth sensors.