Academic

IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time

arXiv:2603.16415v1 Announce Type: new Abstract: Multi-hop question answering (QA) requires reasoning across multiple documents, yet existing retrieval-augmented generation (RAG) approaches address this either through graph-based methods requiring additional online processing or iterative multi-step reasoning. We present IndexRAG, a novel approach that shifts cross-document reasoning from online inference to offline indexing. IndexRAG identifies bridge entities shared across documents and generates bridging facts as independently retrievable units, requiring no additional training or fine-tuning. Experiments on three widely-used multi-hop QA benchmarks (HotpotQA, 2WikiMultiHopQA, MuSiQue) show that IndexRAG improves F1 over Naive RAG by 4.6 points on average, while requiring only single-pass retrieval and a single LLM call at inference time. When combined with IRCoT, IndexRAG outperforms all graph-based baselines on average, including HippoRAG and FastGraphRAG, while relying solely on

Z
Zhenghua Bao, Yi Shi
· · 1 min read · 12 views

arXiv:2603.16415v1 Announce Type: new Abstract: Multi-hop question answering (QA) requires reasoning across multiple documents, yet existing retrieval-augmented generation (RAG) approaches address this either through graph-based methods requiring additional online processing or iterative multi-step reasoning. We present IndexRAG, a novel approach that shifts cross-document reasoning from online inference to offline indexing. IndexRAG identifies bridge entities shared across documents and generates bridging facts as independently retrievable units, requiring no additional training or fine-tuning. Experiments on three widely-used multi-hop QA benchmarks (HotpotQA, 2WikiMultiHopQA, MuSiQue) show that IndexRAG improves F1 over Naive RAG by 4.6 points on average, while requiring only single-pass retrieval and a single LLM call at inference time. When combined with IRCoT, IndexRAG outperforms all graph-based baselines on average, including HippoRAG and FastGraphRAG, while relying solely on flat retrieval. Our code will be released upon acceptance.

Executive Summary

The IndexRAG approach presents an innovative solution to the challenge of cross-document reasoning in multi-hop question answering (QA) by bridging facts at index time. By identifying shared entities across documents and generating bridging facts as independently retrievable units, IndexRAG eliminates the need for additional online processing or iterative multi-step reasoning. The method demonstrates significant improvements in F1 scores over existing retrieval-augmented generation (RAG) approaches, including Naive RAG and graph-based baselines such as HippoRAG and FastGraphRAG. This breakthrough has the potential to enhance the efficiency and accuracy of cross-document reasoning, a critical component of multi-hop QA.

Key Points

  • IndexRAG bridges facts for cross-document reasoning at index time, reducing online processing requirements.
  • The approach identifies bridge entities and generates bridging facts as independently retrievable units.
  • IndexRAG outperforms existing RAG approaches and graph-based baselines on three widely-used multi-hop QA benchmarks.

Merits

Improved Efficiency

IndexRAG reduces the need for additional online processing or iterative multi-step reasoning, resulting in faster and more efficient cross-document reasoning.

Enhanced Accuracy

The approach demonstrates significant improvements in F1 scores over existing RAG approaches, including Naive RAG and graph-based baselines.

Flexibility and Scalability

IndexRAG can be easily integrated with existing retrieval systems and does not require additional training or fine-tuning.

Demerits

Complexity of Entity Identification

Identifying bridge entities shared across documents may be a complex task, especially for large and diverse datasets.

Limited Generalizability

IndexRAG may not generalize well to domains or tasks with unique characteristics or requirements.

Expert Commentary

The IndexRAG approach represents a significant breakthrough in the field of multi-hop question answering, addressing the limitations of existing RAG methods and graph-based baselines. By bridging facts at index time, IndexRAG eliminates the need for additional online processing or iterative multi-step reasoning, resulting in faster and more efficient cross-document reasoning. The approach also demonstrates significant improvements in F1 scores over existing RAG approaches, including Naive RAG and graph-based baselines. While there are potential limitations to the approach, including the complexity of entity identification and limited generalizability, IndexRAG has the potential to enhance the efficiency and accuracy of multi-hop QA systems, leading to improved performance in various applications.

Recommendations

  • Future research should focus on addressing the complexity of entity identification and developing more robust methods for identifying bridge entities shared across documents.
  • IndexRAG should be evaluated on a broader range of datasets and tasks to assess its generalizability and potential applications.

Sources