Digital Linguistic Bias in Spanish: Evidence from Lexical Variation in LLMs
AI models struggle with Chilean Spanish while favoring European and Mexican varieties.
A new study reveals Large Language Models exhibit systematic 'Digital Linguistic Bias' in Spanish. Researchers tested models on over 900 lexical items across 21 Spanish-speaking countries. Models recognized vocabulary from Spain, Mexico, and Central America more accurately, but performed worst on Chilean Spanish. Crucially, the bias patterns don't correlate with the amount of digital data available per country, suggesting deeper algorithmic issues shape how AI represents global language diversity.
Why It Matters
This exposes how AI can marginalize certain dialects, impacting translation, content moderation, and accessibility for millions of speakers.