Academic

From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG

arXiv:2603.03292v1 Announce Type: cross Abstract: Large Language Models (LLMs) exhibit high reasoning capacity in medical question-answering, but their tendency to produce hallucinations and outdated knowledge poses critical risks in healthcare fields. While Retrieval-Augmented Generation (RAG) mitigates these issues, existing methods rely on noisy token-level signals and lack the multi-round refinement required for complex reasoning. In the paper, we propose **MA-RAG** (**M**ulti-Round **A**gentic RAG), a framework that facilitates test-time scaling for complex medical reasoning by iteratively evolving both external evidence and internal reasoning history within an agentic refinement loop. At each round, the agent transforms semantic **conflict** among candidate responses into actionable queries to retrieve external evidence, while optimizing history reasoning traces to mitigate long-context degradation. MA-RAG extends the *self-consistency* principle by leveraging the lack of consis

Video Coverage

From Conflict to Consensus: Boosting Medical Reasoning with MA-RAG

10 min March 6, 2026

arXiv:2603.03292v1 Announce Type: cross Abstract: Large Language Models (LLMs) exhibit high reasoning capacity in medical question-answering, but their tendency to produce hallucinations and outdated knowledge poses critical risks in healthcare fields. While Retrieval-Augmented Generation (RAG) mitigates these issues, existing methods rely on noisy token-level signals and lack the multi-round refinement required for complex reasoning. In the paper, we propose MA-RAG (Multi-Round Agentic RAG), a framework that facilitates test-time scaling for complex medical reasoning by iteratively evolving both external evidence and internal reasoning history within an agentic refinement loop. At each round, the agent transforms semantic conflict among candidate responses into actionable queries to retrieve external evidence, while optimizing history reasoning traces to mitigate long-context degradation. MA-RAG extends the self-consistency principle by leveraging the lack of consistency as a proactive signal for multi-round agentic reasoning and retrieval, and mirrors a boosting mechanism that iteratively minimizes the residual error toward a stable, high-fidelity medical consensus. Extensive evaluations across 7 medical Q&A benchmarks show that MA-RAG consistently surpasses competitive inference-time scaling and RAG baselines, delivering substantial +6.8 points on average accuracy over the backbone model. Our code is available at [this url](https://github.com/NJU-RL/MA-RAG).

Executive Summary

The article proposes a novel framework, Multi-Round Agentic Retrieval-Augmented Generation (MA-RAG), to enhance medical reasoning in Large Language Models (LLMs). MA-RAG iteratively refines both external evidence and internal reasoning history, mitigating hallucinations and outdated knowledge. The framework achieves substantial improvements in accuracy, with an average increase of 6.8 points across 7 medical Q&A benchmarks. This breakthrough has significant implications for the development of reliable medical question-answering systems.

Key Points

  • Introduction of MA-RAG framework for medical reasoning
  • Mitigation of hallucinations and outdated knowledge in LLMs
  • Substantial improvement in accuracy across medical Q&A benchmarks

Merits

Improved Accuracy

MA-RAG achieves significant improvements in accuracy, making it a reliable framework for medical question-answering

Robustness to Hallucinations

The framework effectively mitigates hallucinations and outdated knowledge, reducing the risk of incorrect medical information

Demerits

Computational Complexity

The iterative refinement process in MA-RAG may increase computational complexity, potentially limiting its scalability

Expert Commentary

The MA-RAG framework represents a significant advancement in medical reasoning, addressing the critical issue of hallucinations and outdated knowledge in LLMs. The iterative refinement process and agentic refinement loop demonstrate a nuanced understanding of the complexities involved in medical question-answering. However, further research is necessary to address the potential computational complexity and scalability limitations. The implications of this work are far-reaching, with potential applications in clinical decision support systems and personalized medicine.

Recommendations

  • Further evaluation of MA-RAG in real-world clinical settings
  • Investigation into the integration of medical knowledge graphs and other external evidence sources

Sources