RETUYT-INCO's Meta-prompting achieves 0.729 QWK in German short answer scoring
LLM-generated prompts from training examples boost rubric-based grading accuracy
Researchers from RETUYT-INCO presented Meta-prompting at the BEA 2026 workshop, tackling rubric-based short answer scoring for German. The method involves using an LLM to generate a tailored scoring prompt based on examples from the training set. This prompt then guides the evaluation of new student answers, adapting to the dynamic nature of different rubrics and question types. The team participated in three tracks: Track 1 (unseen answers, three-way scoring), Track 3 (unseen answers, two-way), and Track 4 (unseen questions, two-way). They also experimented with classic machine learning, fine-tuning open-source LLMs, and other prompting strategies for comparison.
Official results placed the team 6th out of 8 in Track 1 with a quadratic weighted kappa (QWK) of 0.729, 4th out of 9 in Track 3 (0.674 QWK), and 4th out of 8 in Track 4 (0.49 QWK). While not the top performer, the Meta-prompting approach shows promise for automated essay grading, especially when rubrics vary across questions. The method reduces the need for manual prompt engineering by letting the LLM dynamically craft evaluation criteria. This work contributes to the growing field of AI-assisted assessment, with potential applications in language education and large-scale testing.
- Meta-prompting uses an LLM to generate a custom scoring prompt from training examples, reducing manual prompt engineering.
- Best performance: 0.729 QWK in Track 1 (unseen answers three-way), placing 6th out of 8 participants.
- Team tested classic ML, fine-tuned open-source LLMs, and prompting techniques, with Meta-prompting yielding competitive results.
Why It Matters
Automates rubric-based grading for German short answers, reducing human effort in scalable language assessments.