Open Source

Qwen's Q-Judger evaluates AI images with 5 dimensions

New vision-language model scores image quality across 5 top-level dimensions

Deep Dive

Alibaba's Qwen team has introduced Q-Judger, a specialized vision-language model designed to automate the evaluation of text-to-image generation outputs. Built on the 27B-parameter Qwen3.6-27B base, Q-Judger takes a text prompt and a generated image as input and produces a structured JSON score across five top-level dimensions: Quality, Alignment, Real-world Fidelity, Creative Generation, and Visual Storytelling. Each dimension is broken into sub-dimensions—for example, Quality includes Realism, Detail, Resolution, Aesthetics, and Lighting, while Alignment checks Attributes, Actions, Layout, Relations, and Scene. The model also assesses fairness (Social Bias, Cultural Fairness) and safety compliance.

What sets Q-Judger apart is its enabled thinking mode: the model uses chain-of-thought reasoning before outputting the final scores. Each sub-dimension is rated on a scale of 0 (Fail), 1 (Pass), 2 (Excel), or N/A. This structured evaluation covers over 40 distinct criteria, ranging from physical logic and material texture to graphic design and cinematic style. By providing a consistent, automated scoring framework, Q-Judger aims to replace or augment human evaluation in image generation workflows, enabling faster iteration and more objective benchmarking of models like Stable Diffusion or DALL-E.

Key Points
  • Q-Judger evaluates images across 5 top-level dimensions with 40+ sub-criteria including Quality, Alignment, and Real-world Fidelity.
  • Outputs structured JSON scores (0=Fail, 1=Pass, 2=Excel, N/A) for each sub-dimension after chain-of-thought reasoning.
  • Built on Qwen3.6-27B, the model also checks for social bias, cultural fairness, and safety compliance.

Why It Matters

Automated, granular image evaluation unlocks faster iteration and objective benchmarking for generative AI workflows.