Last week in Generative Image & Video
Motif-Video 2B hits 83.76% on VBench, beating larger models with 7x fewer parameters.
The open-source generative AI landscape saw significant advances in video and 3D generation last week. The standout is Motif-Video 2B, a community-developed 2-billion parameter Diffusion Transformer (DiT) model. It generates 720p video at 121 frames from a single checkpoint capable of both text-to-video (T2V) and image-to-video (I2V) tasks. It scored 83.76% on the comprehensive VBench evaluation, the highest among open-source models, and notably outperforms the much larger Wan2.1-14B model despite using 7x fewer parameters. However, Wan2.1-14B retains an edge in temporal stability and fine human anatomy details in blind evaluations.
In 3D generation, Tencent's HY-World 2.0 emerged as the first open-source 3D world model that outputs production-ready, editable assets like meshes, 3D Gaussian Splatting (3DGS) representations, and point clouds. These assets can be imported directly into Unity, Unreal Engine, and Blender. Its WorldMirror 2.0 component requires 12-24 GB of VRAM and accepts diverse inputs including text, single images, multi-view images, or video. NVIDIA also released Lyra 2.0, which builds on Wan2.1-14B to create persistent, explorable 3D worlds from a single image, outputting both 3DGS and meshes. Meanwhile, ByteDance's OmniShow unified human-object interaction video generation across text, image, audio, and pose inputs, a capability known as the full RAP2V setting.
- Motif-Video 2B scores 83.76% on VBench, beating the 14B-parameter Wan2.1 model with 7x fewer parameters (2B vs 14B).
- Tencent's HY-World 2.0 is the first open-source model to output editable 3D meshes and assets directly usable in Unity, Unreal, and Blender.
- ByteDance's OmniShow is the only model handling the full 'RAP2V' setting for human-object interaction videos, syncing audio, pose, and reference images.
Why It Matters
These releases democratize high-quality video and 3D asset creation, lowering the barrier for developers and creators with efficient, open-source models.