Media & Culture

Comparison in hallucinations by the top image editing models in Arena when asked to colorize a picture (cropped zoom in of the Solvay Conference)

A viral benchmark reveals which AI image editors actually work—and which fail spectacularly.

Deep Dive

A viral comparison on the AI Arena platform tested top models on a simple task: colorizing a historical photo. The results were shocking. OpenAI's GPT-4o Image, the current top-ranked model, produced outputs "completely different" from the original. The clear winners were Nano Banana Pro and SeeDream 4.5, which minimized hallucinations. Grok and Hunyuan also performed poorly, with the latter appearing to badly downscale and re-upscale the input image.

Why It Matters

This exposes a critical gap between model rankings and real-world usability for creative professionals.