Media & Culture

Showed 4 AI models some abstract Kandinsky-style Pokémon art with no hints, the results are kind of insane

Opus 4.7 nailed all 4 abstract Pokémon, Gemini 3.1 thought they were Sailor Moon.

Deep Dive

A viral Reddit post from user normal_TFguy reveals a fascinating AI benchmark: recognizing abstract Pokémon art. The user fed 4 AI models geometric, Kandinsky-style Pokémon drawings from Instagram artist '8th Project' with the prompt 'Elite Ball pattern recognition required.' Opus 4.7 (no thinking mode) identified all 4 Pokémon instantly. GPT-5.5 (no thinking) got 3 correct. Claude Sonnet 4.6 (extended thinking) managed only 2. The worst performer was Gemini 3.1 Pro (high thinking), which spent 4.5 minutes analyzing the images, considered Squidward and Aladdin, and concluded they were Sailor Moon characters—never once landing on Pokémon. Even after being told the franchise, Gemini only got 1 right at temperature 0.

The results highlight stark differences in multimodal reasoning. Opus 4.7's zero-shot success suggests superior pattern recognition without explicit reasoning steps. GPT-5.5's near-perfect score shows strong visual understanding. Claude's extended thinking actually hurt performance, overcomplicating simple patterns. Gemini's failure is most striking: despite extensive search and self-verification (including writing 'I'm satisfied' before continuing), it never considered Pokémon. This suggests Gemini's multimodal training may lack robust representation for abstracted familiar objects, or its thinking process introduces hallucination. The test underscores that more compute doesn't always mean better reasoning—sometimes simplicity wins.

Key Points
  • Opus 4.7 (no thinking) identified all 4 abstract Pokémon instantly, outperforming all competitors.
  • Gemini 3.1 Pro spent 4.5 minutes thinking, considered Squidward and Aladdin, and concluded they were Sailor Moon characters.
  • Even after being told the correct franchise, Gemini only got 1 Pokémon right at temperature 0.

Why It Matters

Shows AI visual reasoning varies wildly—more thinking doesn't guarantee better recognition of abstract patterns.