BANGLASOCIALBENCH: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction
New benchmark tests 12 LLMs on 1,719 culturally nuanced Bangla scenarios, exposing systematic failures in social hierarchy and kinship.
A team of five researchers from Bangladesh has introduced BanglaSocialBench, a novel benchmark designed to evaluate the sociopragmatic and cultural alignment of Large Language Models (LLMs) in Bangladeshi social contexts. The benchmark moves beyond simple factual recall to test context-dependent language use, spanning three critical domains: Bangla Address Terms, Kinship Reasoning, and Social Customs. It consists of 1,719 culturally grounded instances, all written and verified by native Bangla speakers, focusing on the language's three-tiered pronominal system and complex kinship-based addressing.
In a zero-shot evaluation of twelve contemporary LLMs, the study revealed systematic and non-random patterns of cultural misalignment. The models frequently defaulted to overly formal address forms, failed to recognize that multiple pronouns could be socially acceptable in a given context, and conflated kinship terminology across different religious contexts. This demonstrates that while LLMs like GPT-4 and Claude may exhibit strong multilingual fluency, this does not translate to communicative competence, which requires sensitivity to social hierarchy, relational roles, and unspoken interactional norms.
The findings highlight a persistent limitation in how current LLMs infer and apply culturally appropriate language in realistic social interactions. The benchmark provides a crucial tool for developers to measure and improve the cultural intelligence of AI systems, ensuring they can navigate the high-context nuances of languages like Bangla, where meaning is deeply embedded in social customs rather than just vocabulary and grammar.
- Benchmark contains 1,719 culturally grounded instances across three domains: Address Terms, Kinship Reasoning, and Social Customs.
- Zero-shot tests on 12 LLMs showed systematic failures, like defaulting to overly formal address and confusing kinship terms across religions.
- Reveals that multilingual fluency in LLMs does not guarantee sociopragmatic competence for appropriate, context-aware communication.
Why It Matters
Exposes a critical gap in AI localization: fluency isn't enough for real-world, culturally sensitive communication in global markets.