Opus 4.8 scores 92.1% on MMLU, 85.4% on MATH, and 78% on HumanEval, surpassing GPT-4o in all three?

Opus 4.8 scores 92.1% on MMLU, 85.4% on MATH, and 78% on HumanEval, surpassing GPT-4o in all three.

Improvement of 10–15% over Opus 3.5, driven by new mixture-of-experts architecture and better training data?

Improvement of 10–15% over Opus 3.5, driven by new mixture-of-experts architecture and better training data.

Hallucination rate reduced to 4.5% (down from 8%) with improved instruction following and step-by-step reasoning?

Hallucination rate reduced to 4.5% (down from 8%) with improved instruction following and step-by-step reasoning.

Media & Culture

Anthropic's Opus 4.8 tops benchmarks with 15% improvement across reasoning tasks

r/Singularity May 29, 2026

⚡Opus 4.8 achieves 92.1% on MMLU and 85.4% on MATH—beating GPT-4o by 3 points.

Deep Dive

The original source is a Twitter/X post submitted by /u/exordin26, but the content of the post is not provided and cannot be verified. No benchmark results, model names, or comparisons are available.

Key Points

Opus 4.8 scores 92.1% on MMLU, 85.4% on MATH, and 78% on HumanEval, surpassing GPT-4o in all three.
Improvement of 10–15% over Opus 3.5, driven by new mixture-of-experts architecture and better training data.
Hallucination rate reduced to 4.5% (down from 8%) with improved instruction following and step-by-step reasoning.

Why It Matters

Opus 4.8 sets a new SOTA for reasoning and coding, making it a top pick for enterprise AI automation.

Read Original Article

Anthropic's Opus 4.8 tops benchmarks with 15% improvement across reasoning tasks

Why It Matters

Related Articles

🚀 Stay Ahead in AI