AI Safety

Human-in-the-Loop LLM Grading for Handwritten Mathematics Assessments

arXiv cs.CY March 16, 2026

⚡Researchers combine LLMs with mandatory human verification to grade handwritten math tests at scale.

Deep Dive

A research team from KU Leuven has developed a novel, scalable workflow that uses Large Language Models (LLMs) to assist in grading short, handwritten mathematics assessments. The system addresses a critical challenge in education: providing timely, individualized feedback on pen-and-paper work, which has become more pressing as generative AI undermines the reliability of unsupervised, take-home assignments. Their end-to-end process involves constructing solution keys, developing detailed rubric-style grading guides for the LLM, and a grading procedure that includes automated scanning, anonymization, multi-pass LLM scoring, automated consistency checks, and—crucially—mandatory human verification.

The team deployed this hybrid system in two undergraduate mathematics courses, using it to grade six low-stakes, in-class tests. The empirical results are promising: LLM assistance reduced overall grading time by approximately 23%. Importantly, the agreement between grades was comparable to, and in some cases even tighter than, fully manual grading. While occasional model errors occurred, the human-in-the-loop design effectively contained them. The study demonstrates that carefully embedded AI assistance can substantially reduce instructor workload without sacrificing the fairness and accuracy essential for academic assessment, presenting a viable model for scaling personalized feedback.

Key Points

The system reduced grading time by ~23% in real-world tests across two undergraduate math courses.
It uses a multi-pass LLM scoring process guided by detailed rubrics, followed by mandatory human verification to catch errors.
The workflow is designed for scalability, from scanning and anonymizing papers to automated consistency checks, addressing the shift back to supervised in-class assessments.

Why It Matters

Offers educators a practical tool to maintain assessment integrity and provide personalized feedback efficiently as AI changes the homework landscape.

Read Original Article

Human-in-the-Loop LLM Grading for Handwritten Mathematics Assessments

Why It Matters

Stay Ahead in AI