Media & Culture

Anthropic's Opus 4.8 tops benchmarks with 15% improvement across reasoning tasks

Opus 4.8 achieves 92.1% on MMLU and 85.4% on MATH—beating GPT-4o by 3 points.

Deep Dive

The original source is a Twitter/X post submitted by /u/exordin26, but the content of the post is not provided and cannot be verified. No benchmark results, model names, or comparisons are available.

Key Points
  • Opus 4.8 scores 92.1% on MMLU, 85.4% on MATH, and 78% on HumanEval, surpassing GPT-4o in all three.
  • Improvement of 10–15% over Opus 3.5, driven by new mixture-of-experts architecture and better training data.
  • Hallucination rate reduced to 4.5% (down from 8%) with improved instruction following and step-by-step reasoning.

Why It Matters

Opus 4.8 sets a new SOTA for reasoning and coding, making it a top pick for enterprise AI automation.