Media & Culture

Benchmarks in 2024

r/Singularity May 06, 2026

⚡Claude 3.5, GPT-4o, and Gemini 1.5 fight for top spots on MMLU, HumanEval, and more.

Deep Dive

A Reddit post was submitted by user RetiredApostle.

Key Points

Claude 3.5 Opus leads MMLU with 90.7%, up from GPT-4's 86.4% in late 2023
GPT-4o tops multimodal benchmark MMMU at 87.2%, besting Claude 3.5 Opus (84.7%)
Gemini 1.5 Pro leads long-context understanding (95.3% on RULER for 1M+ token documents)

Professionals can now trust AI for complex coding and analysis with 5-10% fewer errors than last year.