First collaborative monocular dense SLAM system using learned 3D reconstruction priors for outdoor multi-agent mapping?

First collaborative monocular dense SLAM system using learned 3D reconstruction priors for outdoor multi-agent mapping

No depth sensors or parametric intrinsics required; runs online at 8 FPS on standard hardware?

No depth sensors or parametric intrinsics required; runs online at 8 FPS on standard hardware

Matches or exceeds state-of-the-art RGB-D methods on Tanks and Temples (best ATE on 3 of 4 scenes) and Waymo datasets?

Matches or exceeds state-of-the-art RGB-D methods on Tanks and Temples (best ATE on 3 of 4 scenes) and Waymo datasets

Robotics

CoMo3R-SLAM enables multi-robot 3D mapping with just RGB cameras

arXiv cs.RO June 01, 2026

⚡No depth sensors needed: CoMo3R-SLAM maps outdoor scenes at 8 FPS from monocular video only.

Deep Dive

Collaborative dense SLAM is crucial for multi-robot teams to perceive large outdoor environments, but existing systems typically rely on depth sensors that add payload, power, and calibration costs. CoMo3R-SLAM, introduced by Zhihao Cao and colleagues, eliminates these requirements by using only monocular RGB cameras. The system leverages learned feed-forward 3D reconstruction priors to handle scale ambiguity and unreliable inter-agent data association common in outdoor scenes with low overlap and repetitive structures. Each agent runs a prior-guided front-end for real-time tracking and local dense fusion, while a central coordinator performs dense pointmap matching for cross-agent verification, closed-form Sim(3) gauge synchronization, and GPU-accelerated global bundle adjustment with segment-level depth optimization. This design produces globally consistent metric maps from RGB alone, without needing depth sensors or even parametric camera intrinsics.

On the Tanks and Temples benchmark and Waymo outdoor sequences, CoMo3R-SLAM achieves the best Absolute Trajectory Error (ATE) on three out of four Tanks and Temples scenes and competitive accuracy on Waymo, matching or exceeding state-of-the-art RGB-D methods. The system runs online at 8 frames per second, making it practical for real-world deployment. By removing the reliance on depth hardware, CoMo3R-SLAM dramatically reduces the cost and complexity of equipping robot swarms or drone teams with dense 3D perception, opening new possibilities for scalable outdoor multi-agent mapping, search and rescue, and autonomous navigation.

Key Points

First collaborative monocular dense SLAM system using learned 3D reconstruction priors for outdoor multi-agent mapping
No depth sensors or parametric intrinsics required; runs online at 8 FPS on standard hardware
Matches or exceeds state-of-the-art RGB-D methods on Tanks and Temples (best ATE on 3 of 4 scenes) and Waymo datasets

Why It Matters

Enables lightweight, scalable 3D perception for robot swarms and drones without bulky depth sensors.

Read Original Article

CoMo3R-SLAM enables multi-robot 3D mapping with just RGB cameras

Why It Matters

Related Articles

🚀 Stay Ahead in AI