Claude Opus 4.8 on MineBench: cheaper, faster, and better builds than 4.7
New Claude model cuts inference time and cost while improving output quality over Opus 4.7
Anthropic's latest Claude model, Opus 4.8, was evaluated on MineBench, a public benchmark that tests a model's ability to generate 3D structures in a Minecraft-like environment. The model receives a palette of virtual blocks and a prompt (e.g., “fighter jet”) and must output a JSON specifying coordinates for each block. Compared to Opus 4.7, Opus 4.8 showed a clear improvement: average inference time dropped to 24.8 minutes (1,487 seconds) and total cost for 15 builds was just $41.52, despite identical API pricing. The CoT (chain-of-thought) thinking process has been streamlined, similar to recent OpenAI updates, reducing token usage without sacrificing output quality.
Opus 4.8’s builds are described as comparable in quality to GPT 5.5, though with more inconsistency—5 out of 15 builds required retries due to hallucinations (using unavailable blocks) or malformed JSON. The adaptive thinking mechanism worked better than in earlier Claude versions, avoiding the problem of exhausting output tokens on CoT before completing the JSON. Overall, the benchmark author considers Opus 4.8 a genuinely impressive release, what Opus 4.7 was perhaps intended to be. For professionals, this signals that Anthropic is making tangible efficiency gains while maintaining or improving reasoning capability, a key trend for cost-sensitive AI deployments.
- Claude Opus 4.8 completed 15 MineBench builds in avg 24.8 min, costing $41.52—30% cheaper than Opus 4.7
- Build quality matches GPT 5.5 but with more inconsistency; 5 retries due to block hallucinations or malformed JSON
- Streamlined CoT reduces token waste, letting the model allocate more output to actual JSON structure generation
Why It Matters
Anthropic's Opus 4.8 shows smarter token usage — cheaper inferences without sacrificing quality for generative tasks.