Unified model for both video generation and editing on the Wan-2.2 backbone?

Unified model for both video generation and editing on the Wan-2.2 backbone

Novel latent semantic planning ensures temporal consistency across frames?

Novel latent semantic planning ensures temporal consistency across frames

Open-source release includes full model weights on Hugging Face and a detailed paper

Image & Video

r/StableDiffusion June 01, 2026

⚡A single diffusion model that generates and edits videos using latent semantic planning.

Deep Dive

Bernini is a video diffusion model introduced

Key Points

Unified model for both video generation and editing on the Wan-2.2 backbone
Novel latent semantic planning ensures temporal consistency across frames
Open-source release includes full model weights on Hugging Face and a detailed paper

One model to generate and edit video—streamlining creative workflows and reducing compute costs for professionals.