Differences Between GPT 5.4 and GPT 5.4-Pro on MineBench
Independent benchmark reveals the high-end model costs $435 for 15 builds, with questionable performance improvements.
An independent benchmark called MineBench, created by developer Ammaar Alam, has put OpenAI's latest models to a unique 3D creativity test. The benchmark challenges AI models like GPT-5.4 and GPT-5.4-Pro to generate complex 3D structures, such as a fighter jet, by outputting JSON coordinates for virtual blocks. The results reveal a stark cost-performance reality: running the high-end GPT-5.4-Pro was extremely expensive, totaling $435 for just 15 API calls, which averages to $29 per generated build. Furthermore, each build required significant compute time, averaging 56 minutes with the longest taking 76 minutes.
Despite the substantial cost and compute time, the benchmark author subjectively noted that many of GPT-5.4-Pro's builds did not represent a "huge jump" in quality or detail compared to those from the standard GPT-5.4. This raises questions about whether the system prompts effectively leverage the Pro model's advanced reasoning capabilities or if the performance ceiling for this specific creative task has been reached. The findings are particularly relevant for developers and researchers operating on limited budgets, as the test was funded partly by $140 in donations and personal funds, underscoring the financial barrier to testing state-of-the-art models. The benchmark serves as a crucial, community-driven tool for evaluating AI not just on standard metrics, but on practical, creative applications where value for money is a key concern.
- GPT-5.4-Pro cost $435 for 15 API calls, averaging $29 per 3D build generation.
- Build creation times were long, averaging 56 minutes with a maximum of 76 minutes.
- Subjective analysis found minimal quality improvements over GPT-5.4, questioning the Pro version's value for this task.
Why It Matters
Provides real-world, cost-aware benchmarking for AI developers, showing diminishing returns on expensive flagship models for creative tasks.