ByteDance's Bernini unifies video generation and editing with semantic planning
A single diffusion model that generates and edits videos using latent semantic planning.
Deep Dive
Bernini is a video diffusion model introduced
Key Points
- Unified model for both video generation and editing on the Wan-2.2 backbone
- Novel latent semantic planning ensures temporal consistency across frames
- Open-source release includes full model weights on Hugging Face and a detailed paper
Why It Matters
One model to generate and edit video—streamlining creative workflows and reducing compute costs for professionals.