CODE-GEN: A Human-in-the-Loop RAG-Based Agentic AI System for Multiple-Choice Question Generation
A new agentic AI system generates multiple-choice coding questions with human-validated success rates up to 98.6%.
Researchers Xiaojing Duan, Frederick Nwanganga, and Chaoli Wang have introduced CODE-GEN, a novel human-in-the-loop AI system designed to generate context-aligned multiple-choice questions for coding education. The system employs an agentic architecture featuring two specialized AI agents: a Generator that creates coding comprehension questions aligned with specific learning objectives, and a Validator that independently assesses content quality across seven pedagogical dimensions. Both agents are enhanced with specialized tools that verify code outputs and ensure computational accuracy, creating a robust framework for educational content generation.
To evaluate CODE-GEN's effectiveness, the researchers conducted a comprehensive study involving six human subject-matter experts who assessed 288 AI-generated questions, resulting in 2,016 human-AI rating pairs and 131 qualitative feedback instances. The analysis revealed strong system performance, with human-validated success rates ranging from 79.9% to 98.6% across different pedagogical dimensions. CODE-GEN demonstrated particularly high reliability in areas well-suited to computational verification, including question clarity, code validity, concept alignment, and correct answer validity.
The qualitative feedback analysis provided crucial insights into the division of labor between AI and human expertise. While CODE-GEN excels at tasks with explicit criteria and computational verification, human instructors remain essential for dimensions requiring deeper pedagogical judgment, such as designing meaningful distractors and providing high-quality feedback that reinforces student understanding. These findings help inform strategic allocation of human and AI effort in educational content generation, suggesting a complementary rather than replacement role for AI in instructional design.
The research, accepted as a short paper at the 27th International Conference on Artificial Intelligence in Education (AIED 2026), represents a significant advancement in AI-assisted educational technology. By combining RAG (retrieval-augmented generation) with agentic AI architecture and human oversight, CODE-GEN offers a scalable solution for creating high-quality coding assessment materials while maintaining pedagogical integrity through strategic human intervention where it matters most.
- CODE-GEN uses a dual-agent architecture with Generator and Validator agents enhanced by specialized verification tools
- Evaluation with 6 SMEs on 288 questions showed 79.9-98.6% human-validated success rates across 7 pedagogical dimensions
- System excels at computational verification but requires human expertise for deeper instructional judgment like distractor design
Why It Matters
Provides scalable AI-assisted question generation while strategically preserving human expertise where pedagogical judgment is most critical.