GPT-5.2 Pro Solved a Problem Previously Listed on Epoch AI's List of Open Math Problems
An AI solved a problem that would take a mathematician 3-12 months, but the result was removed from a key benchmark.
OpenAI's GPT-5.2 Pro, operating within a specialized harness developed by researcher David Turturean, has successfully generated a solution to a problem previously listed on Epoch AI's benchmark of open mathematical challenges. The problem, concerning deformations from monomial ideals, was categorized as requiring a 'solid result' and had seen serious attempts from 2-4 mathematicians. Epoch's prior survey estimated an expert would need 3 to 12 months to solve it, highlighting its perceived difficulty. The AI's achievement demonstrates the growing capability of large language models (LLMs) to tackle complex, structured reasoning tasks when provided with the right scaffolding.
Despite this technical success, Epoch AI made the controversial decision to retroactively remove the problem from its benchmark. The organization determined that GPT-5.2 Pro's solution, while constituting a 'novel family of worked examples,' did not rise to the level of a publishable mathematical result that would yield a general strategy. This decision underscores the ongoing debate about what constitutes meaningful AI progress in research. Epoch has stated it will amend its problem-sourcing process to filter out similar cases in the future. A preprint co-authored by Turturean and the original problem's author, Gergely Berczi, is expected to detail the AI-discovered examples.
- GPT-5.2 Pro solved a problem estimated to take a human expert 3-12 months of work.
- The solution was generated using a custom 'harness' or scaffolding built by researcher David Turturean.
- Epoch AI removed the problem from its benchmark, stating the AI's result was not a general, publishable finding.
Why It Matters
It highlights both AI's potential to accelerate research and the nuanced challenge of defining genuine scientific discovery versus incremental solution-finding.