Models & Releases

GPT 5.2 versus GPT 5.3-Codex on MineBench

The new model costs 90% less than Claude Opus 4.6 and adds realistic shading to smoke effects.

Deep Dive

A new comparison of OpenAI's latest coding-focused models on the MineBench AI benchmark reveals surprising performance gains and cost efficiencies. GPT-5.3-Codex, the newest iteration in OpenAI's Codex series designed for programming tasks, dramatically outperformed its predecessor GPT-5.2-Codex in generating detailed Minecraft structures from text descriptions. While the Codex models aren't specifically trained for this type of creative benchmark, GPT-5.3-Codex produced builds with sophisticated details like shaded smoke effects and furnished interiors, suggesting improved multimodal understanding and creative generation capabilities beyond pure code.

The technical breakdown shows GPT-5.3-Codex completed 15 benchmark builds for under $5—a 90% cost reduction compared to Claude Opus 4.6's $60+ run that included failed JSON attempts. The model demonstrated novel visual sophistication, adding realistic darkened sections to smoke columns (previously only seen in Gemini 3.1 Pro) and even furnishing building interiors. This unexpected performance on a non-specialized benchmark indicates GPT-5.3-Codex may have broader creative applications than anticipated, potentially challenging specialized creative AI models while maintaining the cost efficiency crucial for developers building AI-powered applications.

Key Points
  • GPT-5.3-Codex ran 15 MineBench builds for under $5, 90% cheaper than Claude Opus 4.6's $60+ run
  • The model added realistic shaded smoke effects, a visual detail previously only achieved by Gemini 3.1 Pro
  • Demonstrated unexpected creative capabilities by furnishing building interiors despite being a code-focused model

Why It Matters

Shows AI models are becoming both more capable and cost-effective for creative tasks, expanding practical applications beyond their core training.