GPT-5.3-Codex ran 15 MineBench builds for under $5, 90% cheaper than Claude Opus 4.6's $60+ run?

GPT-5.3-Codex ran 15 MineBench builds for under $5, 90% cheaper than Claude Opus 4.6's $60+ run

The model added realistic shaded smoke effects, a visual detail previously only achieved by Gemini 3.1 Pro?

The model added realistic shaded smoke effects, a visual detail previously only achieved by Gemini 3.1 Pro

Demonstrated unexpected creative capabilities by furnishing building interiors despite being a code-focused model?

Demonstrated unexpected creative capabilities by furnishing building interiors despite being a code-focused model

Models & Releases

OpenAI's GPT-5.3-Codex surprises with 10x cheaper, more detailed MineBench builds

r/OpenAI February 25, 2026

⚡The new model costs 90% less than Claude Opus 4.6 and adds realistic shading to smoke effects.

Deep Dive

A new comparison of OpenAI's latest coding-focused models on the MineBench AI benchmark reveals surprising performance gains and cost efficiencies. GPT-5.3-Codex, the newest iteration in OpenAI's Codex series designed for programming tasks, dramatically outperformed its predecessor GPT-5.2-Codex in generating detailed Minecraft structures from text descriptions. While the Codex models aren't specifically trained for this type of creative benchmark, GPT-5.3-Codex produced builds with sophisticated details like shaded smoke effects and furnished interiors, suggesting improved multimodal understanding and creative generation capabilities beyond pure code.

The technical breakdown shows GPT-5.3-Codex completed 15 benchmark builds for under $5—a 90% cost reduction compared to Claude Opus 4.6's $60+ run that included failed JSON attempts. The model demonstrated novel visual sophistication, adding realistic darkened sections to smoke columns (previously only seen in Gemini 3.1 Pro) and even furnishing building interiors. This unexpected performance on a non-specialized benchmark indicates GPT-5.3-Codex may have broader creative applications than anticipated, potentially challenging specialized creative AI models while maintaining the cost efficiency crucial for developers building AI-powered applications.

Key Points

GPT-5.3-Codex ran 15 MineBench builds for under $5, 90% cheaper than Claude Opus 4.6's $60+ run
The model added realistic shaded smoke effects, a visual detail previously only achieved by Gemini 3.1 Pro
Demonstrated unexpected creative capabilities by furnishing building interiors despite being a code-focused model

Why It Matters

Shows AI models are becoming both more capable and cost-effective for creative tasks, expanding practical applications beyond their core training.

Read Original Article

OpenAI's GPT-5.3-Codex surprises with 10x cheaper, more detailed MineBench builds

Why It Matters

Related Articles

🚀 Stay Ahead in AI