Research & Papers

Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

New benchmark reveals which AI models truly understand Greek social and cultural context, not just language.

Deep Dive

Researchers Charalampos Mastrokostas, Nikolaos Giarelis, and Nikos Karacapilidis introduced the DemosQA benchmark to evaluate 11 monolingual and multilingual LLMs on Greek Question Answering. The benchmark includes a novel dataset built from Greek social media to capture cultural zeitgeist and tests models across 6 human-curated QA datasets using 3 prompting strategies. It provides a framework to measure how well AI handles under-resourced languages beyond simple translation.

Why It Matters

Highlights the performance gap for non-English AI and pushes for models that understand cultural nuance, not just vocabulary.