Gemini 3 and GPT-5.2 Pro Solve Final 2 of 10 Elite Math Proof Questions
Top AI models just cracked the hardest problems in a major new benchmark.
In the First Proof benchmark, Gemini 3 Deepthink and GPT-5.2 Pro correctly solved questions 9 and 10βthe two most difficult problems. Each model had two attempts with different prompts. The other eight questions remained unsolved. This test, using publicly available models, highlights the current frontier of AI's advanced mathematical reasoning capabilities, showing where even the most powerful systems begin to struggle with complex, multi-step proof generation.
Why It Matters
It reveals the precise limit of today's top AI models in solving elite-level, formal mathematical reasoning.