Research & Papers

Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIs

14 LLMs show human-like math anxiety and overconfidence across 28,000 simulated student personas

Deep Dive

A team of researchers from multiple institutions has released MEDS (Math Education Digital Shadows), a comprehensive dataset designed to probe how large language models (LLMs) reason about and report mathematics under human-like and AI-like conditions. MEDS captures 28,000 distinct personas derived from 14 different LLMs—including Mistral, Qwen, DeepSeek, Granite, Phi, and Grok families. Each persona includes detailed psychological and sociodemographic metadata alongside four types of math tasks: open math interviews, psychometric tests on math perceptions (with explanations), cognitive network diagrams mapping math attitudes, and 18 high-school math test questions with reasoning and confidence scores. Unlike traditional score-only benchmarks, MEDS integrates self-efficacy, math anxiety, and cognitive network science to provide a richer picture of LLM math behavior.

Data validation reveals striking patterns: sampled LLMs maintain schema integrity and consistent personas but display family-specific peculiarities. Some LLMs exhibit human-like negative math attitudes, commit logical fallacies, and show overconfidence in their answers—mirroring real student behaviors that can undermine learning. These findings have direct implications for developers of AI math tutors, learning analytics experts, and cognitive scientists. By exposing where LLMs are overconfident or anxious, MEDS helps designers build safer, more effective tutoring systems that adapt to a learner's affective state. The dataset is publicly available on arXiv and promises to be a critical tool for improving AI-mediated math education.

Key Points
  • MEDS covers 14 LLM families including Mistral, Qwen, DeepSeek, Granite, Phi, and Grok with 28,000 personas
  • Dataset includes 4 math tasks: open interview, psychometric tests, cognitive networks, and 18 high-school math questions with confidence scores
  • Validation found human-like negative math attitudes, logical fallacies, and overconfidence varying by model family

Why It Matters

Exposing LLMs' hidden math anxieties and overconfidence is critical for building safer, more effective AI math tutors.