GARD: New diffusion framework makes multi-view 3D reconstruction robust to real-world degradation
A team of 11 researchers introduces diffusion-based denoising directly in 3D feature space to handle noisy inputs.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Multi-view 3D reconstruction models have made remarkable strides under ideal conditions, but real-world scenarios often involve image degradation from noise, blur, compression, or lighting variations—breaking their performance. A research team comprising 11 authors from Korea has introduced Geometry-Aware Representation Denoising (GARD), a diffusion-based framework that operates directly on the feature representations of an existing feed-forward 3D reconstructor. Unlike prior methods that clean input images before reconstruction (which loses 3D context) or rely on post-processing, GARD uses diffusion modeling to denoise the internal geometry-aware features, preserving spatial relationships and recovering accurate scene geometry.
The framework consists of a diffusion denoiser attached to the feature encoder of a pre-trained multi-view reconstruction model. It trains a noise predictor that reverses feature corruption step-by-step, conditioned on multi-view consistency. Additionally, GARD includes an auxiliary RGB decoder that reconstructs clean images from the refined features, enabling simultaneous restoration of both 3D geometry and high-quality 2D imagery. Experiments on the Depth Anything 3 (DA3) benchmark demonstrate GARD's ability to maintain reconstruction accuracy under severe synthetic and real degradations, outperforming both baseline models and image-level denoising pipelines. The work opens a new path for deploying feed-forward 3D models in uncontrolled environments like robotics, autonomous driving, and AR/VR.
- GARD performs diffusion-based denoising inside the feature space of a feed-forward 3D reconstructor, not on input images, preserving geometry-awareness.
- The framework simultaneously recovers 3D scene geometry and high-quality RGB images via an additional image decoder.
- Tested on the Depth Anything 3 (DA3) benchmark, GARD shows robust performance against noise, blur, and compression artifacts common in real-world deployment.
Why It Matters
Bridges the gap between ideal training and real-world deployment for 3D reconstruction in robotics, VR, and autonomous driving.