TR-EduVSum: A Turkish-Focused Dataset and Consensus Framework for Educational Video Summarization
New AutoMUP framework automatically creates gold-standard summaries from 82 educational videos, matching top LLMs.
Researchers Figen Eğin and Aytuğ Onan have introduced TR-EduVSum, a specialized dataset and framework designed to advance educational video summarization for Turkish-language content. The dataset comprises 82 course videos focused on "Data Structures and Algorithms," accompanied by a substantial collection of 3,281 independent human summaries. This resource addresses a significant gap in non-English AI training data and provides a robust foundation for developing and testing summarization models tailored to Turkic languages.
The core innovation is the AutoMUP (Automatic Meaning Unit Pyramid) framework, which automates the creation of a reliable "gold-standard" summary. Instead of relying on a single human annotator, AutoMUP extracts "meaning units" from all 3,281 summaries, clusters them semantically using embeddings, and statistically models inter-participant agreement. It then constructs a consensus-based summary, weighted by how frequently different participants mentioned key points. This method produces a reproducible benchmark that reflects collective understanding.
In testing, summaries generated by the AutoMUP framework demonstrated high semantic alignment with those produced by state-of-the-art large language models like GPT-5.1 and Flash 2.5. The research, accepted at the SIGTURK 2026 workshop, also includes ablation studies confirming that the consensus weighting and clustering components are critical to the framework's performance. The approach is designed for cost-effective generalization to other Turkic languages, potentially unlocking better AI tools for millions of speakers.
- Dataset includes 82 Turkish educational videos and 3,281 human-written summaries for "Data Structures and Algorithms" courses.
- AutoMUP framework uses embedding clustering and statistical modeling to build consensus-based "gold standard" summaries automatically.
- Resulting summaries show high overlap with outputs from top LLMs like GPT-5.1, validating the method's quality.
Why It Matters
Creates a crucial benchmark for developing and evaluating Turkish-language AI, improving educational tools for a major language family.