Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark
New benchmark reveals which AI models truly understand Greek social and cultural context, not just language.
Researchers Charalampos Mastrokostas, Nikolaos Giarelis, and Nikos Karacapilidis introduced the DemosQA benchmark to evaluate 11 monolingual and multilingual LLMs on Greek Question Answering. The benchmark includes a novel dataset built from Greek social media to capture cultural zeitgeist and tests models across 6 human-curated QA datasets using 3 prompting strategies. It provides a framework to measure how well AI handles under-resourced languages beyond simple translation.
Why It Matters
Highlights the performance gap for non-English AI and pushes for models that understand cultural nuance, not just vocabulary.