Mathematics Teachers Interactions with a Multi-Agent System for Personalized Problem Generation
Researchers built a four-agent AI system that generates and vets personalized math problems with 90% accuracy.
A research team led by Candace Walkington presented a novel multi-agent AI system at AIED 2026 designed to help mathematics teachers create personalized problems. The system operates through a teacher-in-the-loop workflow where educators input a base problem and desired topic, then a large language model generates the content. Four specialized AI agents immediately evaluate the output: one checks mathematical accuracy to prevent hallucinations, another assesses authenticity of real-world contexts, a third analyzes readability for middle school students, and a fourth evaluates overall problem realism. This multi-layered validation happens in real-time as teachers work within the ASSISTments platform.
In practical testing, eight middle school mathematics teachers used the system to create 212 problems that were assigned to their students. The study revealed several key findings: while the AI agents successfully flagged numerous realism issues during problem creation, teachers and students reported few realism problems in the final versions. Mathematical hallucinations—a common concern with LLMs—proved "somewhat rare" in the validated outputs. However, both educators and learners expressed strong desire to modify fine-grained personalized elements, particularly the real-world contexts of problems, indicating that while AI can generate appropriate content, human control over personalization remains crucial for authentic educational fit.
The research provides important implications for designing AI-assisted educational tools. The multi-agent approach demonstrates that specialized validation systems can effectively mitigate common LLM weaknesses like inaccuracies and inappropriate content. Yet the study emphasizes that successful implementation requires maintaining teacher agency, particularly in personalization decisions where human judgment about student context and interests outperforms algorithmic optimization. This balanced approach—combining AI efficiency with human oversight—could transform how educators create differentiated learning materials at scale.
- Four-agent AI system validates math problems for accuracy, authenticity, readability, and realism with 90%+ detection rates
- Teachers created 212 personalized problems with rare mathematical hallucinations in final outputs
- Study reveals teachers and students want more control over real-world context personalization despite AI's strong realism detection
Why It Matters
Shows how multi-agent AI can scale personalized education while preserving crucial teacher control over learning content.