Media & Culture

Gemini 3.1 pro shows no improvement on FrontierMath tier 4.

New benchmark reveals a surprising performance gap in advanced mathematical reasoning between top AI models.

Deep Dive

Google's latest Gemini 3.1 Pro model shows no improvement on the challenging FrontierMath Tier 4 benchmark, a test of advanced mathematical reasoning. The model reportedly falls 'surprisingly far behind' OpenAI's GPT-5.2 Pro in this head-to-head comparison. This performance gap raises questions about the current competitive landscape and prompts speculation about the performance of other models like DeepSeek's DeepThink in similar high-stakes evaluations.

Why It Matters

Benchmark performance directly impacts which models enterprises and researchers trust for complex, reasoning-intensive tasks like scientific research and advanced analytics.