Image & Video

[Open Source] UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models (Powered by Wan2.2 & VGGT)

UniGeo lets you chain camera movements like 'pan left 15 degrees' for precise editing.

Deep Dive

UniGeo is a new open-source framework that leverages video models (Wan2.2) and unified geometric guidance to achieve precise, camera-controllable image editing. The pipeline starts with a source image and a natural language command, which the system parses into explicit physical camera parameters (e.g., 'Camera pans left by 15 degrees; Camera moves left by 0.27'). Users can chain multiple movements for complex trajectories.

Next, VGGT generates a guiding point cloud as a preview, allowing users to iterate and tweak camera parameters before committing to heavy compute. Once satisfied, the point cloud is fed into a fine-tuned Wan2.2-5B model along with the source image to render the final fluid sequence. Unlike methods that switch between fixed viewpoints, UniGeo enables continuous, physically fluid camera trajectories, maintaining strict spatial consistency on 'in-the-wild' images and eliminating background distortion.

Key Points
  • UniGeo converts natural language commands into explicit physical camera parameters for precise control
  • Uses VGGT for point cloud generation, enabling iterative tweaking before rendering
  • Fine-tuned Wan2.2-5B model renders continuous, fluid camera trajectories, not discrete angles

Why It Matters

UniGeo democratizes professional-grade camera control in image editing, enabling precise spatial consistency without expensive hardware.