PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation
arXiv:2603.13275v1 Announce Type: new Abstract: Accurate prediction of surgical duration is pivotal for hospital resource management. Although recent supervised learning approaches-from machine learning (ML) to fine-tuned large language models (LLMs)-have shown strong performance, they remain constrained by the need for high-quality labeled data and computationally intensive training. In contrast, zero-shot LLM inference offers a promising training-free alternative but it lacks grounding in institution-specific clinical context (e.g., local demographics and case-mix distributions), making its predictions clinically misaligned and prone to instability. To address these limitations, we present PREBA, a retrieval-augmented framework that integrates PCA-weighted retrieval and Bayesian averaging aggregation to ground LLM predictions in institution-specific clinical evidence and statistical priors. The core of PREBA is to construct an evidence-based prompt for the LLM, comprising (1) the mo
arXiv:2603.13275v1 Announce Type: new Abstract: Accurate prediction of surgical duration is pivotal for hospital resource management. Although recent supervised learning approaches-from machine learning (ML) to fine-tuned large language models (LLMs)-have shown strong performance, they remain constrained by the need for high-quality labeled data and computationally intensive training. In contrast, zero-shot LLM inference offers a promising training-free alternative but it lacks grounding in institution-specific clinical context (e.g., local demographics and case-mix distributions), making its predictions clinically misaligned and prone to instability. To address these limitations, we present PREBA, a retrieval-augmented framework that integrates PCA-weighted retrieval and Bayesian averaging aggregation to ground LLM predictions in institution-specific clinical evidence and statistical priors. The core of PREBA is to construct an evidence-based prompt for the LLM, comprising (1) the most clinically similar historical surgical cases and (2) clinical statistical priors. To achieve this, PREBA first encodes heterogeneous clinical features into a unified representation space enabling systematic retrieval. It then performs PCA-weighted retrieval to identify clinically relevant historical cases, which form the evidence context supplied to the LLM. Finally, PREBA applies Bayesian averaging to fuse multi-round LLM predictions with population-level statistical priors, yielding calibrated and clinically plausible duration estimates. We evaluate PREBA on two real-world clinical datasets using three state-of-the-art LLMs, including Qwen3, DeepSeek-R1, and HuatuoGPT-o1. PREBA significantly improves performance-for instance, reducing MAE by up to 40% and raising R^2 from -0.13 to 0.62 over zero-shot inference-and it achieves accuracy competitive with supervised ML methods, demonstrating strong effectiveness and generalization.
Executive Summary
The article introduces PREBA, a novel framework for predicting surgical duration by leveraging retrieval-augmented large language models (LLMs) and Bayesian averaging aggregation. PREBA addresses the limitations of existing supervised learning approaches and zero-shot LLM inference by grounding predictions in institution-specific clinical evidence and statistical priors. The framework constructs an evidence-based prompt for the LLM, comprising clinically similar historical cases and statistical priors, and applies Bayesian averaging to fuse multi-round predictions. Evaluations on real-world datasets demonstrate significant performance improvements over zero-shot inference and competitive accuracy with supervised ML methods.
Key Points
- ▸ PREBA integrates PCA-weighted retrieval and Bayesian averaging aggregation to improve surgical duration prediction
- ▸ The framework grounds LLM predictions in institution-specific clinical evidence and statistical priors
- ▸ Evaluations demonstrate significant performance improvements over zero-shot inference and competitive accuracy with supervised ML methods
Merits
Effective Use of Clinical Context
PREBA's ability to incorporate institution-specific clinical evidence and statistical priors enhances the accuracy and reliability of surgical duration predictions
Improved Performance
The framework's performance improvements over zero-shot inference and competitive accuracy with supervised ML methods demonstrate its potential for practical applications
Demerits
Dependence on High-Quality Data
PREBA's effectiveness relies on the availability of high-quality, institution-specific clinical data, which may be a limitation in certain healthcare settings
Computational Complexity
The framework's use of PCA-weighted retrieval and Bayesian averaging aggregation may introduce computational complexity, potentially affecting its scalability and efficiency
Expert Commentary
The introduction of PREBA marks a significant advancement in the field of surgical duration prediction, as it addresses the long-standing challenges of incorporating clinical context and statistical priors into AI-driven predictions. The framework's ability to leverage institution-specific data and adapt to different healthcare settings makes it a promising solution for improving patient care and resource allocation. However, further research is needed to address the potential limitations of PREBA, including its dependence on high-quality data and computational complexity. As the healthcare industry continues to adopt AI-driven technologies, the development of transparent, explainable, and effective frameworks like PREBA will be essential for ensuring the safe and efficient delivery of care.
Recommendations
- ✓ Further evaluate PREBA's performance in diverse healthcare settings to ensure its generalizability and effectiveness
- ✓ Investigate the potential applications of PREBA in other areas of healthcare, such as predicting patient outcomes or optimizing treatment plans
- ✓ Develop strategies to address the potential limitations of PREBA, including its dependence on high-quality data and computational complexity