Last week in Generative Image & Video
A 15B model jointly generates video+audio, beating commercial rivals, while new tools automate design and predict brain activity.
The open-source generative AI community delivered a powerhouse week of multimodal releases, headlined by DaVinci-MagiHuman. This 15-billion-parameter, single-stream Transformer model jointly generates synchronized video and audio under the permissive Apache 2.0 license. In human evaluations, it achieved a dominant 80% win rate against the commercial model Ovi 1.1 and a 60.9% win rate against LTX 2.3, supporting seven languages. Its release as a full stack—model, demo, and code—signals a significant leap in accessible, high-quality video synthesis.
Beyond video, the week saw tools that automate and refine creative workflows. PSDesigner is an open-source system that replicates a human-like creative process for automated graphic design. For researchers, Meta released TRIBE v2, a foundation model that predicts brain activity in response to video, audio, and text stimuli. Meanwhile, practical tools for creators advanced, including the ComfyUI VACE Video Joiner v2.5 for seamless loops, PixelSmile LoRAs for fine-grained facial expression control in images, and LongCat-AudioDiT for diffusion-based text-to-speech, complete with ready-to-use ComfyUI nodes.
- DaVinci-MagiHuman: 15B open-source model for joint video+audio gen with 80% win rate vs. Ovi 1.1.
- PSDesigner & Meta TRIBE v2: Tools for automated graphic design and predicting brain response to media.
- Creator Tools: New ComfyUI nodes for video joining, facial expression LoRAs, and diffusion TTS released.
Why It Matters
These open-source releases democratize high-end video generation and creative automation, putting powerful tools directly in developers' hands.