GenFusion: Feed-forward Human Performance Capture via Progressive Canonical Space Updates
New AI model builds complete 3D human models from monocular video, even reconstructing unseen body parts.
A research team from Stanford University and KAIST has introduced GenFusion, a breakthrough AI system that creates complete 3D human avatars from ordinary single-camera video feeds. Unlike traditional methods requiring multiple cameras or depth sensors, GenFusion uses a novel 'progressive canonical space' approach that continuously updates a 3D memory bank as the subject moves. This allows the system to accumulate appearance information from different angles over time, effectively reconstructing body parts that were never directly visible in any single frame.
The key innovation is a probabilistic regression framework that intelligently resolves conflicts between past observations stored in memory and current live frame data. This produces significantly sharper reconstructions than previous deterministic approaches, while enabling plausible synthesis even in regions with zero prior observations. The system demonstrated strong performance on both specialized datasets like 4D-Dress and more general benchmarks like MVHumanNet, showing robust generalization without requiring dataset-specific retraining.
GenFusion represents a major step toward practical, accessible 3D human capture technology. By eliminating the need for multi-camera rigs or specialized hardware, it opens up new possibilities for content creation, virtual try-ons, and telepresence applications. The feed-forward architecture also enables real-time processing, making it suitable for live applications where traditional reconstruction methods would be too computationally expensive.
- Creates complete 3D human models from single-camera video using progressive canonical space updates
- Uses probabilistic regression to resolve conflicts between past and current observations, producing sharper results
- Demonstrated effectiveness on both specialized (4D-Dress) and general (MVHumanNet) datasets without retraining
Why It Matters
Enables affordable 3D avatar creation for virtual try-ons, content production, and telepresence without specialized hardware.