Media & Culture

Benchmarks in 2024

Claude 3.5, GPT-4o, and Gemini 1.5 fight for top spots on MMLU, HumanEval, and more.

Deep Dive

A Reddit post was submitted by user RetiredApostle.

Key Points
  • Claude 3.5 Opus leads MMLU with 90.7%, up from GPT-4's 86.4% in late 2023
  • GPT-4o tops multimodal benchmark MMMU at 87.2%, besting Claude 3.5 Opus (84.7%)
  • Gemini 1.5 Pro leads long-context understanding (95.3% on RULER for 1M+ token documents)

Why It Matters

Professionals can now trust AI for complex coding and analysis with 5-10% fewer errors than last year.