We may have a new SOTA open-source model: ERNIE-Image Comparisons
New open-source model generates images in 2 seconds, rivaling closed-source quality with Asian face bias.
Baidu has released ERNIE-Image, a new open-source image generation model that is being hailed as a potential new state-of-the-art (SOTA) contender. Early comparisons show its base model produces images with exceptional aesthetic quality, cinematic lighting, and professional color grading that can compete with leading closed-source models. However, initial user testing indicates a significant bias in its training data, as it generates Asian faces with much higher fidelity and excels particularly in anime and illustration styles, while its performance on other aesthetics is less consistent.
A key technical highlight is the model's speed. While the base 'ERNIE-Image' model took 29 seconds to generate an image on an RTX 6000 Blackwell Pro GPU, the 'ERNIE-Image Turbo' variant completed the same task in a mere 2 seconds—a dramatic 14x speed increase. The models are now available for download on Hugging Face under the Comfy-Org repository and are designed to integrate directly into the popular ComfyUI visual programming interface for stable diffusion workflows, making them immediately accessible to developers and creators.
- The base ERNIE-Image model generates cinematic-quality images that rival top closed-source models.
- The model shows a strong bias toward Asian faces and excels at anime/illustration styles, per user tests.
- A 'Turbo' version offers a 14x speed boost, generating an image in 2 seconds versus the base model's 29 seconds.
Why It Matters
Provides a high-quality, fast open-source alternative for image generation, though its demographic bias requires careful application.