Academic

Improving Automatic Summarization of Radiology Reports through Mid-Training of Large Language Models

arXiv:2603.19275v1 Announce Type: cross Abstract: Automatic summarization of radiology reports is an essential application to reduce the burden on physicians. Previous studies have widely used the "pre-training, fine-tuning" strategy to adapt large language models (LLMs) for summarization. This study proposed a subdomain adaptation through a mid-training method to improve summarization. We explored three adaptation strategies: (1) general-domain pre-training, (2) clinical-domain pre-training, and (3) clinical-domain pre-training followed by subdomain mid-training. We developed models using large-scale clinical text from the University of Florida (UF) Health and conducted mid-training and fine-tuning experiments using widely used benchmark datasets including OpenI and MIMIC-CXR. The experimental results show that the mid-trained model, GatorTronT5-Radio, achieved the best performance, outperforming models without mid-training in both text-based measures (ROUGE-L) and factuality measure

arXiv:2603.19275v1 Announce Type: cross Abstract: Automatic summarization of radiology reports is an essential application to reduce the burden on physicians. Previous studies have widely used the "pre-training, fine-tuning" strategy to adapt large language models (LLMs) for summarization. This study proposed a subdomain adaptation through a mid-training method to improve summarization. We explored three adaptation strategies: (1) general-domain pre-training, (2) clinical-domain pre-training, and (3) clinical-domain pre-training followed by subdomain mid-training. We developed models using large-scale clinical text from the University of Florida (UF) Health and conducted mid-training and fine-tuning experiments using widely used benchmark datasets including OpenI and MIMIC-CXR. The experimental results show that the mid-trained model, GatorTronT5-Radio, achieved the best performance, outperforming models without mid-training in both text-based measures (ROUGE-L) and factuality measures (RadGraph-F1). Our mid-training methods also demonstrate better few-shot learning and could alleviate the "cold start" problem reported in previous studies as a learning barrier. Our findings support the use of "pre-training, mid-training, fine-tuning," instead of the widely used direct fine-tuning strategy.

Executive Summary

This study proposes a novel approach to improve the automatic summarization of radiology reports by introducing a mid-training method for large language models. The authors explored three adaptation strategies and achieved the best performance with the GatorTronT5-Radio model, outperforming models without mid-training in both text-based and factuality measures. The findings support the use of a 'pre-training, mid-training, fine-tuning' strategy, which could alleviate the 'cold start' problem and improve few-shot learning. The study demonstrates the effectiveness of mid-training in adapting large language models for summarization tasks and has significant implications for the reduction of the burden on physicians.

Key Points

  • Mid-training improves the performance of large language models in automatic summarization of radiology reports.
  • The GatorTronT5-Radio model achieved the best performance with mid-training and fine-tuning.
  • The study demonstrates better few-shot learning and alleviates the 'cold start' problem.

Merits

Strength of mid-training approach

The mid-training approach allows for subdomain adaptation, which improves the performance of large language models in summarization tasks.

Improved few-shot learning

The study demonstrates that mid-training improves few-shot learning, which is essential for adapting models to new domains and tasks.

Alleviation of 'cold start' problem

The mid-training approach alleviates the 'cold start' problem, which is a significant barrier to the adoption of AI-powered summarization tools in clinical settings.

Demerits

Limited generalizability

The study is limited to a specific dataset and may not generalize to other radiology reports or domains.

Dependence on large datasets

The mid-training approach requires large datasets, which may not be readily available in all clinical settings.

Technical complexity

The mid-training approach requires significant technical expertise and may be challenging to implement in clinical settings.

Expert Commentary

The study demonstrates the effectiveness of mid-training as a novel approach to improving the performance of large language models in summarization tasks. The findings have significant implications for the adoption of AI-powered summarization tools in clinical settings, which can reduce the burden on physicians and improve patient care. However, the study is limited by its dependence on large datasets and technical complexity, which may be challenging to implement in clinical settings. Future studies should aim to address these limitations and explore the generalizability of the mid-training approach to other radiology reports and domains.

Recommendations

  • Develop and test the mid-training approach using larger and more diverse datasets to improve generalizability.
  • Explore the use of transfer learning approaches to adapt large language models to new domains and tasks.
  • Develop and implement clinical decision support systems that leverage AI-powered summarization tools to improve healthcare outcomes.

Sources

Original: arXiv - cs.AI