SectEval: Evaluating the Latent Sectarian Preferences of Large Language Models
arXiv:2603.12768v1 Announce Type: new Abstract: As Large Language Models (LLMs) becomes a popular source for religious knowledge, it is important to know if it treats different groups fairly. This study is the first to measure how LLMs handle the differences between the two main sects of Islam: Sunni and Shia. We present a test called SectEval, available in both English and Hindi, consisting of 88 questions, to check the bias-ness of 15 top LLM models, both proprietary and open-weights. Our results show a major inconsistency based on language. In English, many powerful models DeepSeek-v3 and GPT-4o often favored Shia answers. However, when asked the exact same questions in Hindi, these models switched to favoring Sunni answers. This means a user could get completely different religious advice just by changing languages. We also looked at how models react to location. Advanced models Claude-3.5 changed their answers to match the user's country-giving Shia answers to a user from Iran an
arXiv:2603.12768v1 Announce Type: new Abstract: As Large Language Models (LLMs) becomes a popular source for religious knowledge, it is important to know if it treats different groups fairly. This study is the first to measure how LLMs handle the differences between the two main sects of Islam: Sunni and Shia. We present a test called SectEval, available in both English and Hindi, consisting of 88 questions, to check the bias-ness of 15 top LLM models, both proprietary and open-weights. Our results show a major inconsistency based on language. In English, many powerful models DeepSeek-v3 and GPT-4o often favored Shia answers. However, when asked the exact same questions in Hindi, these models switched to favoring Sunni answers. This means a user could get completely different religious advice just by changing languages. We also looked at how models react to location. Advanced models Claude-3.5 changed their answers to match the user's country-giving Shia answers to a user from Iran and Sunni answers to a user from Saudi Arabia. In contrast, smaller models (especially in Hindi) ignored the user's location and stuck to a Sunni viewpoint. These findings show that AI is not neutral; its religious ``truth'' changes depending on the language you speak and the country you claim to be from. The data set is available at https://github.com/secteval/SectEval/
Executive Summary
This study introduces SectEval, a test to evaluate the latent sectarian preferences of Large Language Models (LLMs) in handling differences between Sunni and Shia Islam. The results show significant inconsistencies based on language and location, with models favoring different sects depending on the language used and the user's claimed country of origin. This raises concerns about the neutrality of AI and its potential to provide biased religious advice.
Key Points
- ▸ SectEval is a novel test to assess LLMs' handling of sectarian differences in Islam
- ▸ Results show language-based inconsistencies in LLMs' responses, with models favoring Shia in English and Sunni in Hindi
- ▸ Location-based inconsistencies are also observed, with models adapting their responses to match the user's country
Merits
Comprehensive Evaluation
The study provides a thorough assessment of LLMs' sectarian preferences using a well-designed test
Demerits
Limited Generalizability
The study's findings may not be generalizable to other religious or cultural contexts
Expert Commentary
The study's findings underscore the need for a nuanced understanding of AI's role in shaping religious discourse. As LLMs become increasingly influential, it is crucial to address the biases and inconsistencies that can perpetuate sectarian divisions. By acknowledging and mitigating these biases, developers and policymakers can work towards creating more inclusive and fair AI systems that promote greater understanding and respect for diverse religious perspectives.
Recommendations
- ✓ Developers should prioritize transparency and explainability in LLMs to facilitate understanding of their decision-making processes
- ✓ Further research should be conducted to explore the generalizability of these findings to other religious and cultural contexts