Research & Papers

MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese

Researchers reveal frontier models struggle with open-ended math in Portuguese...

Deep Dive

A collaborative team of researchers from Portugal and Brazil has released MATH-PT, a novel benchmark dataset comprising 1,729 mathematical problems written in both European and Brazilian Portuguese. The dataset is curated from high-quality native sources, including mathematical Olympiads, competitions, and exams from Portugal and Brazil, addressing a significant linguistic bias in existing math reasoning evaluations that are predominantly in English or merely translated from English.

In comprehensive benchmarking of current state-of-the-art LLMs, the study found that frontier reasoning models achieve strong performance on multiple-choice questions compared to open-weight models. However, their performance notably decreases for questions that include figures or are open-ended, highlighting a critical limitation in current AI reasoning capabilities for non-English contexts. The dataset and model outputs are publicly released to foster future research.

Key Points
  • MATH-PT includes 1,729 math problems sourced from native Portuguese Olympiads and exams in Portugal and Brazil
  • Frontier reasoning models outperform open-weight models on multiple-choice but struggle with open-ended and figure-based questions
  • Dataset and model outputs are open access, accepted at PROPOR 2026 conference

Why It Matters

Highlights AI's linguistic bias and need for diverse, non-English benchmarks to improve global reasoning capabilities.