Image & Video

Ideogram 4.0 open-sources 9.3B model with precise text layout control

Open-source AI now generates posters with perfect OCR and exact color hex codes

Deep Dive

Ideogram has open-sourced Ideogram 4.0, a 9.3-billion parameter text-to-image model purpose-built for high-precision graphic design. The model scores 0.97 on X-Omni English OCR accuracy and ranks #2 overall (#1 for open-weights) on the designer preference ELO leaderboard, outperforming commercial models like FLUX 2 [dev]. Its architecture is a 34-layer single-stream Diffusion Transformer (DiT) using Qwen3-VL-8B-Instruct as text encoder, consuming hidden states from 13 intermediate layers. Key innovations include asymmetric CFG (unconditional pass drops text tokens for faster sampling) and native resolution flexibility—one set of weights handles ultra-wide banners to phone wallpapers without extra LoRAs. The model was trained exclusively on structured JSON captions, allowing conditioning with exact hex color codes, bounding-box coordinates, and multi-line multi-font text elements.

For professionals, this means local generation of posters, advertisements, and UI mockups with perfectly readable text—a persistent weakness of earlier open models. The repository includes fp8 and nf4 checkpoints; the nf4 variant runs on a single 24 GB GPU. Native ComfyUI support, a prompting guide, and sampler presets are available. By open-sourcing a model that beats closed-weight competitors in text rendering, Ideogram democratizes design AI for independent creators, agencies, and developers who need privacy, no API costs, and complete control over layout.

Key Points
  • 9.3B parameter DiT with Qwen3-VL-8B text encoder achieves 0.97 OCR accuracy and #1 open-weight designer ELO.
  • nf4 quantization enables inference on a single 24 GB GPU, with native ComfyUI support.
  • Structured JSON prompting allows exact hex color codes, bounding box layouts, and multi-line text generation.

Why It Matters

Graphic designers can now create print-ready posters locally with perfect text rendering and no API fees.