BanglaSocialBench reveals LLMs' cultural blind spots in Bangladeshi social interactions
New benchmark tests 12 LLMs on 1,719 culturally nuanced Bangla scenarios, exposing systematic failures in social hierarchy and kinship.
A team of five researchers from Bangladesh has introduced BanglaSocialBench, a novel benchmark designed to evaluate the sociopragmatic and cultural alignment of Large Language Models (LLMs) in Bangladeshi social contexts. The benchmark moves beyond simple factual recall to test context-dependent language use, spanning three critical domains: Bangla Address Terms, Kinship Reasoning, and Social Customs. It consists of 1,719 culturally grounded instances, all written and verified by native Bangla speakers, focusing on the language's three-tiered pronominal system and complex kinship-based addressing.
In a zero-shot evaluation of twelve contemporary LLMs, the study revealed systematic and non-random patterns of cultural misalignment. The models frequently defaulted to overly formal address forms, failed to recognize that multiple pronouns could be socially acceptable in a given context, and conflated kinship terminology across different religious contexts. This demonstrates that while LLMs like GPT-4 and Claude may exhibit strong multilingual fluency, this does not translate to communicative competence, which requires sensitivity to social hierarchy, relational roles, and unspoken interactional norms.
The findings highlight a persistent limitation in how current LLMs infer and apply culturally appropriate language in realistic social interactions. The benchmark provides a crucial tool for developers to measure and improve the cultural intelligence of AI systems, ensuring they can navigate the high-context nuances of languages like Bangla, where meaning is deeply embedded in social customs rather than just vocabulary and grammar.
- Benchmark contains 1,719 culturally grounded instances across three domains: Address Terms, Kinship Reasoning, and Social Customs.
- Zero-shot tests on 12 LLMs showed systematic failures, like defaulting to overly formal address and confusing kinship terms across religions.
- Reveals that multilingual fluency in LLMs does not guarantee sociopragmatic competence for appropriate, context-aware communication.
Why It Matters
Exposes a critical gap in AI localization: fluency isn't enough for real-world, culturally sensitive communication in global markets.