Research & Papers

3D Multi-View Stylization with Pose-Free Correspondences Matching for Robust 3D Geometry Preservation

A novel AI technique applies artistic styles to multi-view 3D scenes without breaking SLAM or reconstruction pipelines.

Deep Dive

Researcher Shirsha Bose has introduced a novel AI method for applying artistic styles to multi-view 3D scenes without disrupting the geometric consistency required for critical downstream computer vision tasks. Traditional per-view stylization often causes texture drift, warped edges, and inconsistent shading, which degrades the performance of systems like SLAM (Simultaneous Localization and Mapping), depth prediction, and 3D reconstruction. This new approach is "pose-free," meaning it doesn't require known camera poses or an explicit 3D model during training, making it more flexible and practical for real-world applications.

The core innovation is a feed-forward stylization network trained with a composite objective that couples artistic style transfer with geometry preservation. Style is applied using an AdaIN-inspired loss from a frozen VGG-19 encoder. To maintain structural consistency across different viewpoints, the method employs a correspondence-based consistency loss leveraging the established SuperPoint and SuperGlue models. This constrains descriptors from a stylized 'anchor' view to remain consistent with matched descriptors from the original multi-view images. Additionally, a depth-preservation loss using MiDaS/DPT and global color alignment are applied to reduce domain shift.

Evaluated on standard datasets like Tanks and Temples and Mip-NeRF 360, the method was measured using metrics for style adherence (Color Histogram Distance) and structure retention (Structure Distance). For 3D consistency, monocular DROID-SLAM trajectories and symmetric Chamfer distance on point clouds were used. The results show that the proposed correspondence and depth regularization significantly reduce structural distortion, improve SLAM stability, and yield stronger point-cloud consistency compared to baseline methods like MuVieCAST, all while maintaining competitive stylization quality.

Key Points
  • Pose-free method applies artistic styles to 3D scenes without needing camera poses or explicit 3D models for training.
  • Uses SuperPoint/SuperGlue for correspondence consistency and MiDaS/DPT for depth preservation to maintain geometry for SLAM and reconstruction.
  • Shows improved trajectory and point-cloud consistency on Tanks and Temples datasets while maintaining style quality.

Why It Matters

Enables creative stylization of real-world 3D environments for AR/VR and film without breaking essential geometric computer vision pipelines.