MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese
Researchers reveal frontier models struggle with open-ended math in Portuguese...
A collaborative team of researchers from Portugal and Brazil has released MATH-PT, a novel benchmark dataset comprising 1,729 mathematical problems written in both European and Brazilian Portuguese. The dataset is curated from high-quality native sources, including mathematical Olympiads, competitions, and exams from Portugal and Brazil, addressing a significant linguistic bias in existing math reasoning evaluations that are predominantly in English or merely translated from English.
In comprehensive benchmarking of current state-of-the-art LLMs, the study found that frontier reasoning models achieve strong performance on multiple-choice questions compared to open-weight models. However, their performance notably decreases for questions that include figures or are open-ended, highlighting a critical limitation in current AI reasoning capabilities for non-English contexts. The dataset and model outputs are publicly released to foster future research.
- MATH-PT includes 1,729 math problems sourced from native Portuguese Olympiads and exams in Portugal and Brazil
- Frontier reasoning models outperform open-weight models on multiple-choice but struggle with open-ended and figure-based questions
- Dataset and model outputs are open access, accepted at PROPOR 2026 conference
Why It Matters
Highlights AI's linguistic bias and need for diverse, non-English benchmarks to improve global reasoning capabilities.