Last week in Image & Video Generation
FlashMotion enables controllable video generation 50x faster, while ViFeEdit achieves 91.5% color accuracy for professional edits.
Last week's open-source AI developments delivered significant leaps in video generation, editing, and image composition, pushing the boundaries of what's possible without proprietary models. The standout is FlashMotion, a community project that achieves controllable video generation 50x faster than previous methods. It works on top of models like Wan 2.2 for text-to-video and offers precise multi-object control through bounding boxes and masks, plus camera motion guidance. Its weights are already available on Hugging Face, making state-of-the-art video synthesis rapidly accessible.
Alongside speed, professional-grade editing saw major upgrades. ViFeEdit enables complex video edits—like 100% object addition and 91.5% color accuracy—using only image pairs for training, eliminating the need for extensive video datasets. For image generation, GlyphPrinter solves a persistent problem by rendering glyph-accurate, multilingual text within images. Other notable releases include MatAnyone 2 for high-quality video object matting trained on millions of real frames, and a training-free refinement method for camera control and super-resolution on models like CogVideoX, requiring no retraining.
- FlashMotion achieves 50x faster controllable video generation with box/mask guidance for Wan 2.2-T2V models.
- ViFeEdit enables professional video editing from image pairs with 91.5% color accuracy and 100% object addition success.
- GlyphPrinter delivers glyph-accurate multilingual text rendering in generated images, solving a key text-to-image weakness.
Why It Matters
Democratizes professional-grade video production and editing, enabling creators and developers to build with cutting-edge AI without API costs.