Academic

BANGLASOCIALBENCH: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction

arXiv:2603.15949v1 Announce Type: new Abstract: Large Language Models have demonstrated strong multilingual fluency, yet fluency alone does not guarantee socially appropriate language use. In high-context languages, communicative competence requires sensitivity to social hierarchy, relational roles, and interactional norms that are encoded directly in everyday language. Bangla exemplifies this challenge through its three-tiered pronominal system, kinship-based addressing, and culturally embedded social customs. We introduce BANGLASOCIALBENCH, the first benchmark designed to evaluate sociopragmatic competence in Bangla through context-dependent language use rather than factual recall. The benchmark spans three domains: Bangla Address Terms, Kinship Reasoning, and Social Customs, and consists of 1,719 culturally grounded instances written and verified by native Bangla speakers. We evaluate twelve contemporary LLMs in a zero-shot setting and observe systematic patterns of cultural misali

arXiv:2603.15949v1 Announce Type: new Abstract: Large Language Models have demonstrated strong multilingual fluency, yet fluency alone does not guarantee socially appropriate language use. In high-context languages, communicative competence requires sensitivity to social hierarchy, relational roles, and interactional norms that are encoded directly in everyday language. Bangla exemplifies this challenge through its three-tiered pronominal system, kinship-based addressing, and culturally embedded social customs. We introduce BANGLASOCIALBENCH, the first benchmark designed to evaluate sociopragmatic competence in Bangla through context-dependent language use rather than factual recall. The benchmark spans three domains: Bangla Address Terms, Kinship Reasoning, and Social Customs, and consists of 1,719 culturally grounded instances written and verified by native Bangla speakers. We evaluate twelve contemporary LLMs in a zero-shot setting and observe systematic patterns of cultural misalignment. Models frequently default to overly formal address forms, fail to recognize multiple socially acceptable address pronouns, and conflate kinship terminology across religious contexts. Our findings show that sociopragmatic failures are often structured and non-random, revealing persistent limitations in how current LLMs infer and apply culturally appropriate language use in realistic Bangladeshi social interactions.

Executive Summary

The article introduces BANGLASOCIALBENCH, a benchmark designed to evaluate the sociopragmatic competence of Large Language Models (LLMs) in Bangladeshi social interactions. The benchmark consists of 1,719 culturally grounded instances spanning three domains: Bangla Address Terms, Kinship Reasoning, and Social Customs. The authors evaluate twelve contemporary LLMs in a zero-shot setting and observe systematic patterns of cultural misalignment, revealing limitations in how current LLMs infer and apply culturally appropriate language use. The findings highlight the importance of considering sociopragmatic competence in LLMs, particularly in high-context languages like Bangla. The development of BANGLASOCIALBENCH provides a valuable tool for evaluating LLMs in realistic social interactions and underscores the need for more culturally sensitive language models.

Key Points

  • BANGLASOCIALBENCH is a benchmark designed to evaluate the sociopragmatic competence of LLMs in Bangladeshi social interactions.
  • The benchmark consists of 1,719 culturally grounded instances spanning three domains: Bangla Address Terms, Kinship Reasoning, and Social Customs.
  • The authors evaluate twelve contemporary LLMs in a zero-shot setting and observe systematic patterns of cultural misalignment.

Merits

Strength in addressing cultural insensitivity

The development of BANGLASOCIALBENCH provides a valuable tool for evaluating LLMs in realistic social interactions and highlights the importance of considering sociopragmatic competence.

Comprehensive evaluation framework

The benchmark spans three domains, providing a comprehensive evaluation framework for LLMs in Bangladeshi social interactions.

Native speaker verification

The culturally grounded instances were written and verified by native Bangla speakers, ensuring the accuracy and cultural relevance of the benchmark.

Demerits

Limited scope

The benchmark is currently limited to Bangla and may not be applicable to other languages or social contexts.

Small sample size

The evaluation of twelve LLMs may not be representative of the broader LLM landscape, and a larger sample size may be necessary for more robust conclusions.

Expert Commentary

The study provides a nuanced understanding of the challenges of developing LLMs that can effectively navigate cultural nuances and social norms in high-context languages like Bangla. The development of BANGLASOCIALBENCH is a significant contribution to the field, as it provides a valuable tool for evaluating LLMs in realistic social interactions. However, the study's findings also highlight the need for more research on the development of culturally sensitive language models and the importance of considering sociopragmatic competence in AI systems.

Recommendations

  • Develop more culturally sensitive language models that can effectively navigate cultural nuances and social norms in high-context languages like Bangla.
  • Conduct further research on the development of LLMs that can adapt to different cultural contexts and social norms.

Sources