Image & Video

Last week in Generative Image & Video

r/StableDiffusion April 23, 2026

⚡Motif-Video 2B hits 83.76% on VBench, beating larger models with 7x fewer parameters.

Deep Dive

The open-source generative AI landscape saw significant advances in video and 3D generation last week. The standout is Motif-Video 2B, a community-developed 2-billion parameter Diffusion Transformer (DiT) model. It generates 720p video at 121 frames from a single checkpoint capable of both text-to-video (T2V) and image-to-video (I2V) tasks. It scored 83.76% on the comprehensive VBench evaluation, the highest among open-source models, and notably outperforms the much larger Wan2.1-14B model despite using 7x fewer parameters. However, Wan2.1-14B retains an edge in temporal stability and fine human anatomy details in blind evaluations.

In 3D generation, Tencent's HY-World 2.0 emerged as the first open-source 3D world model that outputs production-ready, editable assets like meshes, 3D Gaussian Splatting (3DGS) representations, and point clouds. These assets can be imported directly into Unity, Unreal Engine, and Blender. Its WorldMirror 2.0 component requires 12-24 GB of VRAM and accepts diverse inputs including text, single images, multi-view images, or video. NVIDIA also released Lyra 2.0, which builds on Wan2.1-14B to create persistent, explorable 3D worlds from a single image, outputting both 3DGS and meshes. Meanwhile, ByteDance's OmniShow unified human-object interaction video generation across text, image, audio, and pose inputs, a capability known as the full RAP2V setting.

Key Points

Motif-Video 2B scores 83.76% on VBench, beating the 14B-parameter Wan2.1 model with 7x fewer parameters (2B vs 14B).
Tencent's HY-World 2.0 is the first open-source model to output editable 3D meshes and assets directly usable in Unity, Unreal, and Blender.
ByteDance's OmniShow is the only model handling the full 'RAP2V' setting for human-object interaction videos, syncing audio, pose, and reference images.

Why It Matters

These releases democratize high-quality video and 3D asset creation, lowering the barrier for developers and creators with efficient, open-source models.

Read Original Article

Last week in Generative Image & Video

Why It Matters

Stay Ahead in AI