IdiomX benchmark tests AI on 190K idiomatic expressions across 3 languages
Over 12,000 idioms in English, Arabic, and French with literal vs. figurative labels…
Idiomatic expressions have long been a stumbling block for NLP models because their meanings are non-compositional and heavily context-dependent. Existing resources are often too small or lack multilingual coverage. To close this gap, Ayman Ali Sharara built IdiomX — a large-scale benchmark comprising over 190K contextualized examples covering more than 12,000 idioms, with aligned semantic representations in English, Arabic, and French. Each example includes labels for idiomatic vs. literal usage and rich linguistic metadata.
The benchmark defines four tasks: idiom detection, context-to-idiom retrieval, Arabic-to-English idiom retrieval, and idiom interpretation. Experiments reveal that contextual transformer models significantly improve idiom detection accuracy, while hybrid retrieval-reranking architectures strengthen both monolingual and cross-lingual retrieval. Notably, the study shows idiom interpretation can be effectively framed as a semantic retrieval task, adding explainability as a new evaluation dimension. IdiomX is fully open-source, with code and data available on HuggingFace, Kaggle, and GitHub, and is designed to be extensible to additional languages and figurative reasoning tasks.
- 190K+ examples covering 12K+ idioms across English, Arabic, and French
- Four tasks: detection, context-to-idiom retrieval, cross-lingual retrieval, and interpretation
- Contextual transformers boost detection; hybrid retrieval architectures improve cross-lingual performance
Why It Matters
A scalable multilingual benchmark that pushes AI from figurative detection to semantic understanding and cross-lingual retrieval.