Grounding Arabic LLMs in the Doha Historical Dictionary: Retrieval-Augmented Understanding of Quran and Hadith
arXiv:2603.23972v1 Announce Type: new Abstract: Large language models (LLMs) have achieved remarkable progress in many language tasks, yet they continue to struggle with complex historical and religious Arabic texts such as the Quran and Hadith. To address this limitation, we develop a retrieval-augmented generation (RAG) framework grounded in diachronic lexicographic knowledge. Unlike prior RAG systems that rely on general-purpose corpora, our approach retrieves evidence from the Doha Historical Dictionary of Arabic (DHDA), a large-scale resource documenting the historical development of Arabic vocabulary. The proposed pipeline combines hybrid retrieval with an intent-based routing mechanism to provide LLMs with precise, contextually relevant historical information. Our experiments show that this approach improves the accuracy of Arabic-native LLMs, including Fanar and ALLaM, to over 85\%, substantially reducing the performance gap with Gemini, a proprietary large-scale model. Gemini
arXiv:2603.23972v1 Announce Type: new Abstract: Large language models (LLMs) have achieved remarkable progress in many language tasks, yet they continue to struggle with complex historical and religious Arabic texts such as the Quran and Hadith. To address this limitation, we develop a retrieval-augmented generation (RAG) framework grounded in diachronic lexicographic knowledge. Unlike prior RAG systems that rely on general-purpose corpora, our approach retrieves evidence from the Doha Historical Dictionary of Arabic (DHDA), a large-scale resource documenting the historical development of Arabic vocabulary. The proposed pipeline combines hybrid retrieval with an intent-based routing mechanism to provide LLMs with precise, contextually relevant historical information. Our experiments show that this approach improves the accuracy of Arabic-native LLMs, including Fanar and ALLaM, to over 85\%, substantially reducing the performance gap with Gemini, a proprietary large-scale model. Gemini also serves as an LLM-as-a-judge system for automatic evaluation in our experiments. The automated judgments were verified through human evaluation, demonstrating high agreement (kappa = 0.87). An error analysis further highlights key linguistic challenges, including diacritics and compound expressions. These findings demonstrate the value of integrating diachronic lexicographic resources into retrieval-augmented generation frameworks to enhance Arabic language understanding, particularly for historical and religious texts. The code and resources are publicly available at: https://github.com/somayaeltanbouly/Doha-Dictionary-RAG.
Executive Summary
This study proposes a retrieval-augmented generation (RAG) framework, grounded in diachronic lexicographic knowledge, to improve the understanding of complex historical and religious Arabic texts. The framework uses the Doha Historical Dictionary of Arabic (DHDA) as a resource to provide LLMs with precise and contextually relevant historical information. The study demonstrates the effectiveness of this approach, achieving accuracy rates of over 85% for Arabic-native LLMs. The findings highlight the importance of integrating diachronic lexicographic resources into RAG frameworks to enhance Arabic language understanding. The study also emphasizes key linguistic challenges, such as diacritics and compound expressions, and suggests that further research is needed to address these challenges.
Key Points
- ▸ The study proposes a RAG framework grounded in diachronic lexicographic knowledge to improve the understanding of complex historical and religious Arabic texts.
- ▸ The framework uses the DHDA as a resource to provide LLMs with precise and contextually relevant historical information.
- ▸ The study demonstrates the effectiveness of this approach, achieving accuracy rates of over 85% for Arabic-native LLMs.
Merits
Strength in Methodology
The study uses a rigorous methodology, combining hybrid retrieval with an intent-based routing mechanism to provide LLMs with precise and contextually relevant historical information.
Comprehensive Evaluation
The study includes both automated and human evaluations, providing a comprehensive assessment of the proposed framework.
Demerits
Limited Generalizability
The study focuses on a specific task and dataset, which may limit the generalizability of the findings to other tasks and datasets.
Dependence on DHDA
The study relies on the quality and comprehensiveness of the DHDA as a resource, which may introduce biases and limitations.
Expert Commentary
The study makes a significant contribution to the field of Arabic language processing by proposing a novel RAG framework that leverages diachronic lexicographic knowledge. The study's findings demonstrate the effectiveness of this approach in improving the accuracy of Arabic-native LLMs. However, the study's reliance on the quality and comprehensiveness of the DHDA as a resource raises concerns about the potential biases and limitations of the study. Additionally, the study's focus on a specific task and dataset may limit the generalizability of the findings to other tasks and datasets. Nevertheless, the study provides a valuable starting point for further research in this area, and its implications for language translation and text summarization are significant.
Recommendations
- ✓ Future research should investigate the use of diachronic lexicographic resources in other languages and contexts to determine the generalizability of the findings.
- ✓ Researchers should consider the potential biases and limitations of the DHDA as a resource and explore alternative approaches to mitigating these issues.
Sources
Original: arXiv - cs.CL