daVinci MagiHuman could be the feature
Early open-source model for text-to-video and audio-to-video generation shows potential but needs 24GB VRAM.
The AI research group GAIR has released daVinci MagiHuman, an early-stage open-source model capable of generating video from both text and audio inputs. Currently in a state reminiscent of early Stable Diffusion XL, the core technology shows promise but requires significant community effort for optimization and accessibility. The model demands substantial computational resources, with a realistic minimum of 24GB VRAM (like an RTX 3090) for usable base generation, and it performs best at a 448x448 resolution before outputs become glitchy. Early testers are actively sharing modified workflows and custom ComfyUI nodes on GitHub to help others run the model more efficiently on consumer hardware.
A key practical workaround involves bypassing the model's built-in super-resolution (SR) path, which is currently too heavy for most GPUs, and instead using the official base model paired with a standard post-upscaler. This approach, along with community-shared quantized model alternatives like the t5gemma-9b Q6_K GGUF text encoder, makes experimentation more feasible. The project's success hinges on developer engagement to create better optimizations, easier installation processes, and more accessible quantized versions, potentially paving the way for a powerful, open-source alternative in the text-to-video generation space dominated by closed models like Sora.
- Generates video from text and audio prompts, requiring 24GB VRAM for usable results on hardware like an RTX 3090.
- Community is sharing optimized ComfyUI workflows on GitHub to bypass the heavy built-in super-resolution for better performance.
- Early-stage model needs optimization to reach full potential, similar to the early community-driven development of Stable Diffusion XL.
Why It Matters
Represents a significant open-source push into AI video generation, offering an alternative to closed models and empowering developer-driven innovation.