Far Out: Evaluating Language Models on Slang in Australian and Indian English
Models scored just 0.03 accuracy on some tasks, showing major gaps in understanding local dialects.
Researchers Deniz Kaya Dilsiz, Dipankar Srirag, and Aditya Joshi published 'Far Out,' evaluating 7 state-of-the-art language models on slang comprehension. They tested models on 377 web-sourced and 1,492 synthetic examples of Australian and Indian English slang across three tasks. Key findings show models perform poorly on generative tasks (0.03 accuracy) versus selection tasks (0.49 accuracy), and understand Indian English slang slightly better than Australian. This reveals critical weaknesses in AI's grasp of non-standard language varieties.
Why It Matters
For global AI deployment, models must understand local dialects and slang to avoid bias and errors in communication.