Gemini 2.5 Pro Experimental tops LMArena leaderboard by significant margin, indicating superior human preference quality?

Gemini 2.5 Pro Experimental tops LMArena leaderboard by significant margin, indicating superior human preference quality.

State-of-the-art reasoning on GPQA, AIME 2025, and 18.8% on Humanity's Last Exam without test-time techniques?

State-of-the-art reasoning on GPQA, AIME 2025, and 18.8% on Humanity's Last Exam without test-time techniques.

1M token context window (2M coming), native multimodality, and 63.8% on SWE-Bench Verified for agentic coding?

1M token context window (2M coming), native multimodality, and 63.8% on SWE-Bench Verified for agentic coding.

Models & Releases

Google DeepMind's Gemini 2.5 Pro tops LMArena with reasoning and 1M context

Blog May 02, 2026

⚡Gemini 2.5 Pro Experimental leads benchmarks and debuts #1 on LMArena.

Deep Dive

Google DeepMind has launched Gemini 2.5 Pro Experimental, its most intelligent AI model yet, introducing a new era of thinking models that reason through their thoughts before responding. The model tops the LMArena leaderboard — which measures human preferences — by a wide margin, and achieves state-of-the-art results across math, science, and coding benchmarks. Without costly test-time techniques like majority voting, Gemini 2.5 Pro scores leading results on GPQA and AIME 2025, and achieves 18.8% on Humanity's Last Exam, a dataset designed by hundreds of experts to capture the frontier of human knowledge.

Gemini 2.5 Pro ships with a 1 million token context window (2 million coming soon) and supports native multimodality across text, audio, images, video, and entire code repositories. It excels in agentic code applications, scoring 63.8% on SWE-Bench Verified with a custom agent setup, and can generate executable code for video games from a single prompt. The model is available now in Google AI Studio and for Gemini Advanced users in the Gemini app, with Vertex AI availability in the coming weeks. Pricing details for higher rate limits will be announced soon.

Key Points

Gemini 2.5 Pro Experimental tops LMArena leaderboard by significant margin, indicating superior human preference quality.
State-of-the-art reasoning on GPQA, AIME 2025, and 18.8% on Humanity's Last Exam without test-time techniques.
1M token context window (2M coming), native multimodality, and 63.8% on SWE-Bench Verified for agentic coding.

Why It Matters

Thinking models built into Gemini 2.5 enable more accurate, context-aware AI agents for complex professional workflows.

Read Original Article

Google DeepMind's Gemini 2.5 Pro tops LMArena with reasoning and 1M context

Why It Matters

Related Articles

🚀 Stay Ahead in AI