Image & Video

ByteDance's Bernini unifies video generation and editing with semantic planning

A single diffusion model that generates and edits videos using latent semantic planning.

Deep Dive

Bernini is a video diffusion model introduced

Key Points
  • Unified model for both video generation and editing on the Wan-2.2 backbone
  • Novel latent semantic planning ensures temporal consistency across frames
  • Open-source release includes full model weights on Hugging Face and a detailed paper

Why It Matters

One model to generate and edit video—streamlining creative workflows and reducing compute costs for professionals.