Claude Mythos Preview (unreleased) leads reasoning with 94.6% GPQA Diamond, highest in the 300+ model ranking?

Claude Mythos Preview (unreleased) leads reasoning with 94.6% GPQA Diamond, highest in the 300+ model ranking.

Kimi K2.6 is the cheapest model in the top 10 at $0.95/M tokens and is open-source, scoring 90.5% on GPQA?

Kimi K2.6 is the cheapest model in the top 10 at $0.95/M tokens and is open-source, scoring 90.5% on GPQA.

Grok-4.20 Beta has the largest context window at 2.0M tokens; Mercury 2 is fastest at 1,487 tok/s?

Grok-4.20 Beta has the largest context window at 2.0M tokens; Mercury 2 is fastest at 1,487 tok/s.

Models & Releases

LLM Leaderboard 2026: Claude Mythos leads, Gemini 3.1 Pro tops coding, Kimi K2.6 cheapest

Llm-stats May 07, 2026

⚡Claude Mythos Preview hits 94.6% on GPQA, Gemini 3.1 Pro wins coding arena, Kimi K2.6 at $0.95/M tok.

Deep Dive

A comprehensive new LLM Leaderboard for 2026 has emerged, comparing over 300 AI models across reasoning, coding, agent performance, speed, pricing, and context length. The composite LLM Stats Score aggregates GPQA Diamond, SWE-Bench Verified, and coding-arena results. Currently, Anthropic's unreleased Claude Mythos Preview leads the reasoning category with an impressive 94.6% on GPQA Diamond, making it the top choice for complex logical tasks. Google's Gemini 3.1 Pro dominates coding with the highest coding-arena score. For cost-conscious professionals, Moonshot AI's open-weights Kimi K2.6 ranks 6th overall and is the cheapest in the top 10 at just $0.95 per million tokens, also beating many proprietary models on reasoning (90.5% GPQA).

In terms of speed and context, xAI's Grok-4.20 Beta Non-Reasoning offers the largest practical context window at 2.0 million tokens, ideal for long-document and multi-turn conversations. Mercury 2 claims the fastest output at 1,487 tokens per second. OpenAI's GPT-5.5 ranks second overall, with new entrants like ByteDance's Seed 2.0 Pro and Meta's Muse Spark also appearing in the top 20. DeepSeek's V4-Pro-Max, an open-weights model, ranks 18th with competitive performance and pricing at $1.93/M tok. The leaderboard updates continuously and allows filtering by 30-day or 90-day trends, helping professionals choose the best model for their specific use case—whether prioritizing reasoning, coding, speed, cost, or open-source flexibility.

Key Points

Claude Mythos Preview (unreleased) leads reasoning with 94.6% GPQA Diamond, highest in the 300+ model ranking.
Kimi K2.6 is the cheapest model in the top 10 at $0.95/M tokens and is open-source, scoring 90.5% on GPQA.
Grok-4.20 Beta has the largest context window at 2.0M tokens; Mercury 2 is fastest at 1,487 tok/s.

Why It Matters

This leaderboard simplifies choosing the right AI model by balancing intelligence, speed, and cost—critical for developers and enterprises.

Read Original Article

LLM Leaderboard 2026: Claude Mythos leads, Gemini 3.1 Pro tops coding, Kimi K2.6 cheapest

Why It Matters

Related Articles

🚀 Stay Ahead in AI