Media & Culture

Meituan open sources LongCat-Image-Edit-Turbo, a distilled image editing model that hits open source SOTA in only 8 inference steps

A distilled 6B-parameter model achieves high-quality image editing 10x faster, using only 18GB VRAM.

Deep Dive

Meituan's LongCat team has open-sourced LongCat-Image-Edit-Turbo, a distilled version of their instruction-based image editing model. The key breakthrough is its efficiency: it matches the quality of its larger predecessor in just 8 NFEs (Number of Function Evaluations), representing a roughly 10x speedup. This performance is achieved with a compact 6B-parameter diffusion core, requiring only about 18GB of VRAM with CPU offloading. The model is fully integrated into the HuggingFace Diffusers library and is Apache 2.0 licensed, making it immediately accessible for developers and researchers.

On standard benchmarks, LongCat-Image-Edit-Turbo sets a new open-source SOTA. It scores 4.50 on ImgEdit-Bench and 7.60/7.64 on GEdit-Bench for Chinese and English instructions, respectively, outperforming competitors like FLUX.1 Kontext and Qwen-Image-Edit. Its editing capabilities are extensive, covering global and local edits, object replacement, style transfer, text manipulation, and outpainting. A standout feature is its exceptional consistency preservation, which maintains the layout, texture, and identity of non-edited regions—a critical factor for professional multi-turn editing workflows.

The model's efficiency stems from rigorous data curation and distillation techniques, continuing a trend where smart training methods outperform brute-force parameter scaling. It natively supports both Chinese and English instructions and includes a clever character-level encoding trick for accurate text rendering. With ComfyUI support and training code also released, Meituan's release demonstrates how well-trained, mid-sized models can deliver top-tier performance at a fraction of the computational cost, pushing the boundaries of what's possible in open-source AI.

Key Points
  • Achieves open-source SOTA on ImgEdit-Bench (4.50) and GEdit-Bench with only 8 inference steps, for a 10x speedup.
  • Runs on approximately 18GB VRAM, powered by a compact 6B-parameter diffusion core that outperforms larger models.
  • Apache 2.0 licensed with full HuggingFace Diffusers and ComfyUI integration, featuring strong multi-turn editing consistency.

Why It Matters

Delivers professional-grade, fast image editing in an accessible open-source package, lowering the barrier for developers and creators.