Image & Video

I tested Ernie Image Turbo (fp8, nvfp4, fp16 and INT8) with Nano Banana Pro 2 Prompts so you won't have to

A ComfyUI workflow test shows Ernie Image Turbo beating rivals on text, realism, and speed.

Deep Dive

A detailed technical test of Baidu's Ernie Image Turbo model has gone viral, revealing its surprising strength in generating coherent text and accurate anatomy within images. The test, conducted by ComfyUI user Winnougan using a custom workflow with specialized nodes for INT8 and GGUF loading, pitted Ernie against popular models like Z-Image Turbo and Klein 9b using the challenging 'Nano Banana Pro 2' prompts. The results were striking: Ernie Image Turbo consistently produced correct text (when explicitly prompted), handled complex concepts like comics and cosplay, and avoided common AI art pitfalls like mangled hands and unnatural 'plastic' skin textures. Its lighting was praised as particularly volumetric and cinematic.

Winnougan's workflow, available via the ComfyUI Manager, includes optimizations like using a small Flux 2 VAE encoder and nodes with SageAttention and Triton support for speed. The key finding is that while the model is "effing fast" and accurate, it requires precise text instructions; asking for 'random text' yields gibberish, but specifying the exact words works reliably. This positions Ernie as a powerful tool for creators who use LLMs to craft detailed prompts, offering a significant quality leap over current alternatives. The anticipation is now building for the upcoming 'Ernie Image Edit' feature, hinted to be even more capable.

Key Points
  • Beats rivals on text & anatomy: Accurately renders specified text and realistic human forms, outperforming Z-Image Turbo and Klein 9b.
  • Requires precise prompting: Generates gibberish for 'random text' but excels with detailed, LLM-assisted prompts for concepts and lighting.
  • Custom workflow available: Winnougan's optimized ComfyUI nodes for INT8/GGUF loading with SageAttention boost speed and accessibility.

Why It Matters

This signals a major leap in AI image quality for professionals, making detailed, text-accurate concept art and assets faster to produce.