Academic

Dialogue to Question Generation for Evidence-based Medical Guideline Agent Development

arXiv:2603.23937v1 Announce Type: new Abstract: Evidence-based medicine (EBM) is central to high-quality care, but remains difficult to implement in fast-paced primary care settings. Physicians face short consultations, increasing patient loads, and lengthy guideline documents that are impractical to consult in real time. To address this gap, we investigate the feasibility of using large language models (LLMs) as ambient assistants that surface targeted, evidence-based questions during physician-patient encounters. Our study focuses on question generation rather than question answering, with the aim of scaffolding physician reasoning and integrating guideline-based practice into brief consultations. We implemented two prompting strategies, a zero-shot baseline and a multi-stage reasoning variant, using Gemini 2.5 as the backbone model. We evaluated on a benchmark of 80 de-identified transcripts from real clinical encounters, with six experienced physicians contributing over 90 hours o

arXiv:2603.23937v1 Announce Type: new Abstract: Evidence-based medicine (EBM) is central to high-quality care, but remains difficult to implement in fast-paced primary care settings. Physicians face short consultations, increasing patient loads, and lengthy guideline documents that are impractical to consult in real time. To address this gap, we investigate the feasibility of using large language models (LLMs) as ambient assistants that surface targeted, evidence-based questions during physician-patient encounters. Our study focuses on question generation rather than question answering, with the aim of scaffolding physician reasoning and integrating guideline-based practice into brief consultations. We implemented two prompting strategies, a zero-shot baseline and a multi-stage reasoning variant, using Gemini 2.5 as the backbone model. We evaluated on a benchmark of 80 de-identified transcripts from real clinical encounters, with six experienced physicians contributing over 90 hours of structured review. Results indicate that while general-purpose LLMs are not yet fully reliable, they can produce clinically meaningful and guideline-relevant questions, suggesting significant potential to reduce cognitive burden and make EBM more actionable at the point of care.

Executive Summary

This study investigates the feasibility of using large language models (LLMs) as ambient assistants to surface targeted, evidence-based questions during physician-patient encounters. The authors implemented two prompting strategies using Gemini 2.5 as the backbone model and evaluated on a benchmark of 80 de-identified transcripts from real clinical encounters. Results indicate that general-purpose LLMs can produce clinically meaningful and guideline-relevant questions, suggesting significant potential to reduce cognitive burden and make evidence-based medicine more actionable at the point of care. The study's findings have important implications for the development of evidence-based medical guideline agents and the implementation of high-quality care in fast-paced primary care settings.

Key Points

  • The study explores the use of LLMs as ambient assistants to support evidence-based medicine
  • Two prompting strategies were implemented and evaluated on a benchmark of 80 de-identified transcripts
  • Results indicate that LLMs can produce clinically meaningful and guideline-relevant questions

Merits

Strength

The study provides a well-designed and rigorous evaluation of the feasibility of using LLMs as ambient assistants, with a large benchmark of real clinical encounters and a structured review process.

Demerits

Limitation

The study relies on a general-purpose LLM, which may not be as effective as a model specifically trained for clinical question generation, and the evaluation was limited to a single dataset.

Expert Commentary

This study marks an important step towards the development of evidence-based medical guideline agents that can support high-quality care in fast-paced primary care settings. The use of LLMs as ambient assistants has significant potential to reduce cognitive burden and make evidence-based medicine more actionable at the point of care. However, further research is needed to address the limitations of the study, including the reliance on a general-purpose LLM and the limited evaluation dataset. Additionally, the study's findings highlight the need for policymakers to develop guidelines and regulations for the use of AI in healthcare, particularly in the context of evidence-based medicine and clinical decision support systems.

Recommendations

  • Future studies should investigate the use of LLMs specifically trained for clinical question generation, rather than general-purpose LLMs
  • Further research is needed to evaluate the effectiveness of LLMs in real-world clinical settings and to address the limitations of the study

Sources

Original: arXiv - cs.CL