Progressive Training for Explainable Citation-Grounded Dialogue: Reducing Hallucination to Zero in English-Hindi LLMs
arXiv:2603.18911v1 Announce Type: new Abstract: Knowledge-grounded dialogue systems aim to generate informative, contextually relevant responses by conditioning on external knowledge sources. However, most existing approaches focus exclusively on English, lack explicit citation mechanisms for verifying factual claims, and offer limited transparency into model decision-making. We present XKD-Dial, a progressive four-stage training pipeline for explainable, knowledge-grounded dialogue generation in a bilingual (English-Hindi) setting, comprising: (1) multilingual adaptation, (2) English dialogue SFT with citation grounding, (3) bilingual dialogue SFT, and (4) GRPO alignment with citation-aware rewards. We evaluate six models spanning encoder-decoder (250M-3B) and decoder-only (1B-7B) architectures at every pipeline stage. Our key contributions are: (i) three post-hoc explainability analyses - cross-attention alignment, Integrated Gradients attribution, and occlusion-based causal groundi
arXiv:2603.18911v1 Announce Type: new Abstract: Knowledge-grounded dialogue systems aim to generate informative, contextually relevant responses by conditioning on external knowledge sources. However, most existing approaches focus exclusively on English, lack explicit citation mechanisms for verifying factual claims, and offer limited transparency into model decision-making. We present XKD-Dial, a progressive four-stage training pipeline for explainable, knowledge-grounded dialogue generation in a bilingual (English-Hindi) setting, comprising: (1) multilingual adaptation, (2) English dialogue SFT with citation grounding, (3) bilingual dialogue SFT, and (4) GRPO alignment with citation-aware rewards. We evaluate six models spanning encoder-decoder (250M-3B) and decoder-only (1B-7B) architectures at every pipeline stage. Our key contributions are: (i) three post-hoc explainability analyses - cross-attention alignment, Integrated Gradients attribution, and occlusion-based causal grounding - applied systematically across the training trajectory to reveal how citation behaviour is learned, not only whether it is learned; (ii) citation-grounded SFT reduces hallucination to 0.0% for encoder-decoder models from Stage 2 onward; (iii) the progressive pipeline prevents catastrophic forgetting while improving Hindi capabilities; (iv) smaller models match larger models on English after SFT; and (v) GRPO provides marginal improvement over well-designed SFT for structured citation tasks. We evaluate across six automatic metrics (BLEU, ROUGE, BERTScore, FactScore, Citation-F1, and hallucination rate).
Executive Summary
The article introduces XKD-Dial, a pioneering four-stage progressive training pipeline designed to enhance explainable, citation-grounded dialogue generation in bilingual (English-Hindi) large language models (LLMs). By systematically integrating multilingual adaptation, supervised fine-tuning (SFT) with citation grounding, bilingual SFT, and GRPO alignment with citation-aware rewards, the authors achieve zero hallucination rates in encoder-decoder models from Stage 2 onward. The study evaluates six models across encoder-decoder and decoder-only architectures, employing rigorous post-hoc explainability analyses to trace the development of citation behavior. Key findings include the efficacy of citation-grounded SFT in reducing hallucinations, the mitigation of catastrophic forgetting, and the marginal benefits of GRPO over well-designed SFT for citation tasks. The evaluation spans six automatic metrics, highlighting the pipeline's robustness and transparency in multilingual contexts.
Key Points
- ▸ A progressive four-stage training pipeline (XKD-Dial) for explainable, citation-grounded dialogue generation in English-Hindi LLMs, addressing multilingual adaptation and citation mechanisms.
- ▸ Systematic post-hoc explainability analyses (cross-attention alignment, Integrated Gradients, occlusion-based causal grounding) to trace the learning trajectory of citation behavior in models.
- ▸ Achievement of zero hallucination rates in encoder-decoder models from Stage 2 onward, with citation-grounded SFT and GRPO alignment demonstrating significant improvements in factual accuracy and Hindi capabilities.
- ▸ Empirical evaluation of six models (250M-3B encoder-decoder and 1B-7B decoder-only) across six metrics (BLEU, ROUGE, BERTScore, FactScore, Citation-F1, hallucination rate), revealing insights into model performance and scalability.
- ▸ Demonstration that smaller models can match larger models on English after SFT, and that GRPO provides marginal improvements over well-designed SFT for structured citation tasks.
Merits
Innovative Pipeline Design
The progressive four-stage training pipeline (XKD-Dial) is a novel approach that systematically integrates multilingual adaptation, citation grounding, and alignment techniques to achieve explainable and hallucination-free dialogue generation in bilingual contexts.
Comprehensive Explainability Analysis
The inclusion of three post-hoc explainability methods (cross-attention alignment, Integrated Gradients, occlusion-based causal grounding) provides deep insights into how citation behavior is learned, enhancing transparency and trust in model decision-making.
Empirical Rigor and Scalability
The evaluation of six models across varying architectures and sizes, coupled with six automatic metrics, ensures robust and scalable findings that are applicable to both research and practical implementations in multilingual LLMs.
Hallucination Reduction to Zero
The achievement of zero hallucination rates in encoder-decoder models from Stage 2 onward is a groundbreaking result, demonstrating the pipeline's effectiveness in generating factually accurate responses in citation-grounded dialogue systems.
Demerits
Limited Generalizability to Other Languages
The study focuses exclusively on English-Hindi bilingual settings, leaving the generalizability of the pipeline to other language pairs or low-resource languages unaddressed. Further research is needed to validate its applicability across diverse linguistic contexts.
Marginal Improvement from GRPO
While GRPO alignment shows marginal improvement over well-designed SFT for structured citation tasks, its incremental benefits may not justify the additional computational overhead for all use cases, particularly in resource-constrained environments.
Complexity of the Pipeline
The four-stage pipeline introduces significant complexity, requiring careful orchestration of multilingual adaptation, SFT, and alignment steps. This complexity may pose challenges for adoption in real-world applications without substantial infrastructure and expertise.
Dependence on High-Quality Data
The pipeline's success is contingent on the availability of high-quality, citation-grounded training data. In scenarios where such data is scarce or noisy, the effectiveness of the pipeline may be compromised.
Expert Commentary
The article represents a significant advancement in the field of multilingual and explainable AI, particularly in the domain of dialogue systems. The progressive training pipeline (XKD-Dial) addresses critical gaps in existing approaches by integrating citation grounding, multilingual adaptation, and explainability analyses. The achievement of zero hallucination rates in encoder-decoder models is a notable milestone, demonstrating the potential of structured training methodologies to enhance the reliability of LLMs. The inclusion of post-hoc explainability methods is particularly commendable, as it provides a nuanced understanding of how citation behavior is internalized by the models, moving beyond binary evaluations of success or failure. However, the study's focus on English-Hindi bilingual settings limits its immediate generalizability, and the marginal benefits of GRPO alignment may not justify its adoption in all contexts. The complexity of the pipeline also poses challenges for real-world deployment, requiring careful consideration of resource constraints and infrastructure. Overall, the work is a valuable contribution to the field, offering actionable insights for researchers and practitioners aiming to develop more transparent, accurate, and multilingual AI systems.
Recommendations
- ✓ Conduct further research to validate the generalizability of the XKD-Dial pipeline across additional language pairs, including low-resource languages, to ensure broader applicability and inclusivity in multilingual NLP.
- ✓ Explore the integration of the pipeline with real-time or streaming data pipelines to assess its feasibility in dynamic, high-volume dialogue settings, such as customer support or live translation services.
- ✓ Investigate the potential of hybrid training approaches that combine SFT with other alignment techniques (e.g., DPO, IPO) to determine the most effective and resource-efficient methods for citation grounding and hallucination reduction.
- ✓ Develop open-source toolkits and benchmarks based on the XKD-Dial pipeline to facilitate community adoption, standardization, and further innovation in explainable, citation-grounded dialogue systems.
- ✓ Collaborate with domain experts in linguistics, ethics, and policy to address the broader implications of multilingual AI systems, including issues of bias, cultural appropriateness, and equitable access.