EduResearchBench: A Hierarchical Atomic Task Decomposition Benchmark for Full-Lifecycle Educational Research
New benchmark breaks academic research into 24 atomic tasks, revealing where AI writing fails.
Researchers from multiple institutions built EduResearchBench, the first benchmark for evaluating AI in educational academic writing. It uses a Hierarchical Atomic Task Decomposition (HATD) framework to break research into 6 modules and 24 fine-grained tasks. They trained EduWrite, a specialized 30B parameter model, on 11K curated instruction pairs. EduWrite outperformed larger 72B general models, proving domain-specific training and data quality beat raw parameter count for complex scholarly workflows.
Why It Matters
Enables precise diagnosis of AI weaknesses in research, paving the way for reliable AI co-authors in academia.