Media & Culture

Gemini 3 and GPT-5.2 Pro Solve Final 2 of 10 Elite Math Proof Questions

⚑Top AI models just cracked the hardest problems in a major new benchmark.

Deep Dive

In the First Proof benchmark, Gemini 3 Deepthink and GPT-5.2 Pro correctly solved questions 9 and 10β€”the two most difficult problems. Each model had two attempts with different prompts. The other eight questions remained unsolved. This test, using publicly available models, highlights the current frontier of AI's advanced mathematical reasoning capabilities, showing where even the most powerful systems begin to struggle with complex, multi-step proof generation.

Why It Matters

It reveals the precise limit of today's top AI models in solving elite-level, formal mathematical reasoning.

πŸ“¬ Get the top 10 AI stories daily