Last week in Image & Video Generation
Three new open-source models tackle AI's biggest image and video generation flaws with radical efficiency.
The open-source AI community delivered significant advancements in image and video generation this week, directly addressing persistent quality issues. GlyphPrinter, a new model for accurate text rendering, tackles the notorious problem of garbled text in AI images. It employs a novel training method called Region-Grouped Direct Preference Optimization (RG-DPO) to balance artistic style with precise spelling, especially for localized text like signs or logos. The model's weights are openly available, allowing developers to integrate reliable text generation into their pipelines.
Alongside text fixes, two other tools push the boundaries of video and 3D processing. SegviGen repurposes existing 3D image generators to perform precise object segmentation—identifying and isolating objects in a scene—using a colorization approach that requires less than 1% of the data needed by prior methods. For video enhancement, SparkVSR introduces an interactive super-resolution workflow. Built on CogVideoX, it allows users to upscale a few select keyframes in a video, and the model intelligently propagates those high-quality details across the entire sequence, making high-resolution video processing more efficient. Both projects offer open code and demos.
The roundup also highlighted practical workflow tools for creators, including a comprehensive NVIDIA guide for generating 4K video from Blender 3D scenes within ComfyUI, and new nodes for advanced filmmaking techniques like shot sequencing. Furthermore, community optimizations for the LTX 2.3 model now enable generating 20-second videos on consumer-grade RTX 3070 GPUs in just 21 minutes, democratizing access to high-quality video generation.
- GlyphPrinter fixes AI's text generation flaw using Region-Grouped DPO, delivering accurate spelling while maintaining artistic style with open weights.
- SegviGen performs 3D object segmentation via colorization, achieving high precision with less than 1% of the training data required by previous state-of-the-art models.
- SparkVSR enables interactive video super-resolution by upscaling keyframes and propagating details, offering a practical path to high-quality video upscaling for creators.
Why It Matters
These free, open tools solve core AI visual quality issues, giving professionals and creators direct control over accurate text, object segmentation, and video resolution.