Claude Opus 4.8 completed 15 MineBench builds in avg 24.8 min, costing $41.52—30% cheaper than Opus 4.7?

Claude Opus 4.8 completed 15 MineBench builds in avg 24.8 min, costing $41.52—30% cheaper than Opus 4.7

Build quality matches GPT 5.5 but with more inconsistency; 5 retries due to block hallucinations or malformed JSON?

Build quality matches GPT 5.5 but with more inconsistency; 5 retries due to block hallucinations or malformed JSON

Streamlined CoT reduces token waste, letting the model allocate more output to actual JSON structure generation?

Streamlined CoT reduces token waste, letting the model allocate more output to actual JSON structure generation

Media & Culture

Claude Opus 4.8 on MineBench: cheaper, faster, and better builds than 4.7

r/Singularity June 01, 2026

⚡New Claude model cuts inference time and cost while improving output quality over Opus 4.7

Deep Dive

Anthropic's latest Claude model, Opus 4.8, was evaluated on MineBench, a public benchmark that tests a model's ability to generate 3D structures in a Minecraft-like environment. The model receives a palette of virtual blocks and a prompt (e.g., “fighter jet”) and must output a JSON specifying coordinates for each block. Compared to Opus 4.7, Opus 4.8 showed a clear improvement: average inference time dropped to 24.8 minutes (1,487 seconds) and total cost for 15 builds was just $41.52, despite identical API pricing. The CoT (chain-of-thought) thinking process has been streamlined, similar to recent OpenAI updates, reducing token usage without sacrificing output quality.

Opus 4.8’s builds are described as comparable in quality to GPT 5.5, though with more inconsistency—5 out of 15 builds required retries due to hallucinations (using unavailable blocks) or malformed JSON. The adaptive thinking mechanism worked better than in earlier Claude versions, avoiding the problem of exhausting output tokens on CoT before completing the JSON. Overall, the benchmark author considers Opus 4.8 a genuinely impressive release, what Opus 4.7 was perhaps intended to be. For professionals, this signals that Anthropic is making tangible efficiency gains while maintaining or improving reasoning capability, a key trend for cost-sensitive AI deployments.

Key Points

Claude Opus 4.8 completed 15 MineBench builds in avg 24.8 min, costing $41.52—30% cheaper than Opus 4.7
Build quality matches GPT 5.5 but with more inconsistency; 5 retries due to block hallucinations or malformed JSON
Streamlined CoT reduces token waste, letting the model allocate more output to actual JSON structure generation

Why It Matters

Anthropic's Opus 4.8 shows smarter token usage — cheaper inferences without sacrificing quality for generative tasks.

Read Original Article

Claude Opus 4.8 on MineBench: cheaper, faster, and better builds than 4.7

Why It Matters

Related Articles

🚀 Stay Ahead in AI