Fisheye3R: Adapting Unified 3D Feed-Forward Foundation Models to Fisheye Lenses
New framework solves fisheye distortion without needing labeled fisheye training data, using only perspective images.
A research team from the University of Southern California and Meta has developed Fisheye3R, a breakthrough framework that adapts existing 3D reconstruction foundation models to work seamlessly with fisheye lenses. These models, trained on vast datasets of standard perspective images, typically fail when faced with the extreme radial distortion of fisheye cameras. The distortion alters the spatial position of pixels in a non-linear way, breaking the assumptions of conventional 3D vision algorithms. Fisheye3R solves this by teaching the models to understand the fisheye projection model, enabling accurate inference on wide field-of-view imagery.
The core challenge was the severe lack of labeled fisheye image data needed for traditional fine-tuning. Fisheye3R's novel contribution is its flexible learning schemes that circumvent this data scarcity. It supports both self-supervised adaptation using only unlabeled perspective images and supervised adaptation without any fisheye training data whatsoever. This allows the foundational 3D models to generalize to the fisheye domain using the knowledge they already possess.
Extensive testing across three major foundation models—VGGT, π³, and MapAnything—demonstrates Fisheye3R's effectiveness. The framework consistently improved key metrics for fisheye inputs, including camera pose estimation, depth prediction, 3D point map generation, and field-of-view calculation. Crucially, this adaptation does not come at the cost of performance on the original perspective images, preventing the common problem of catastrophic forgetting. The work represents a significant step toward making powerful, general-purpose 3D vision models universally applicable across all camera types.
- Adapts 3D foundation models (VGGT, π³, MapAnything) to handle fisheye lens distortion without performance loss on standard images.
- Uses novel learning schemes requiring no labeled fisheye data, only unlabeled perspective images for self-supervised adaptation.
- Improves camera pose, depth, and field-of-view estimation for fisheye inputs, solving a major hurdle for robotics and AR/VR.
Why It Matters
Enables robots, drones, and AR/VR systems with wide-angle cameras to use state-of-the-art 3D vision models without costly retraining on scarce data.