Image & Video

HiDream-O1-Image Internal Prompt

The model's hidden system prompt turns vague requests into production-ready image descriptions.

Deep Dive

A new model's internal prompt has been resurfaced from its repo's prompt.py file. It acts as a prompt engineering engine that analyzes raw user requests and rewrites them into detailed English prompts using the SCALIST framework — covering Subject, Composition, Action, Location, Image style, Specs, and Text rendering. The prompt requires explicit knowledge resolution, spatial anchoring, and precise typography to ensure image generation models receive direct visual descriptions.

Key Points
  • HiDream-O1-Image uses an internal SCALIST framework (Subject, Composition, Action, Location, Image style, Specs, Text) to expand user requests into detailed prompts.
  • The system explicitly bans vague language and requires knowledge resolution (e.g., turning 'Mona Lisa' into specific visual details) and spatial anchoring (e.g., 'top left corner').
  • Text rendering is handled with exact character preservation, font/style/color/position specification, supporting multilingual content like Chinese poems and formulas.

Why It Matters

For AI professionals, this reveals how structured prompt engineering can dramatically improve image generation consistency and reduce ambiguity.