Academic

Can we generate portable representations for clinical time series data using LLMs?

arXiv:2603.23987v1 Announce Type: new Abstract: Deploying clinical ML is slow and brittle: models that work at one hospital often degrade under distribution shifts at the next. In this work, we study a simple question -- can large language models (LLMs) create portable patient embeddings i.e. representations of patients enable a downstream predictor built on one hospital to be used elsewhere with minimal-to-no retraining and fine-tuning. To do so, we map from irregular ICU time series onto concise natural language summaries using a frozen LLM, then embed each summary with a frozen text embedding model to obtain a fixed length vector capable of serving as input to a variety of downstream predictors. Across three cohorts (MIMIC-IV, HIRID, PPICU), on multiple clinically grounded forecasting and classification tasks, we find that our approach is simple, easy to use and competitive with in-distribution with grid imputation, self-supervised representation learning, and time series foundatio

arXiv:2603.23987v1 Announce Type: new Abstract: Deploying clinical ML is slow and brittle: models that work at one hospital often degrade under distribution shifts at the next. In this work, we study a simple question -- can large language models (LLMs) create portable patient embeddings i.e. representations of patients enable a downstream predictor built on one hospital to be used elsewhere with minimal-to-no retraining and fine-tuning. To do so, we map from irregular ICU time series onto concise natural language summaries using a frozen LLM, then embed each summary with a frozen text embedding model to obtain a fixed length vector capable of serving as input to a variety of downstream predictors. Across three cohorts (MIMIC-IV, HIRID, PPICU), on multiple clinically grounded forecasting and classification tasks, we find that our approach is simple, easy to use and competitive with in-distribution with grid imputation, self-supervised representation learning, and time series foundation models, while exhibiting smaller relative performance drops when transferring to new hospitals. We study the variation in performance across prompt design, with structured prompts being crucial to reducing the variance of the predictive models without altering mean accuracy. We find that using these portable representations improves few-shot learning and does not increase demographic recoverability of age or sex relative to baselines, suggesting little additional privacy risk. Our work points to the potential that LLMs hold as tools to enable the scalable deployment of production grade predictive models by reducing the engineering overhead.

Executive Summary

This study explores the potential of large language models (LLMs) to generate portable representations for clinical time series data, enabling the deployment of predictive models across different hospitals with minimal retraining. The authors develop a method that maps irregular ICU time series onto concise natural language summaries using a frozen LLM, followed by text embedding to obtain a fixed-length vector. The approach is tested on three cohorts and multiple clinically grounded forecasting and classification tasks, demonstrating competitive performance with existing methods and reduced performance drops when transferring to new hospitals. The study highlights the importance of structured prompts in reducing predictive model variance and suggests that LLMs can be a valuable tool for scalable deployment of production-grade predictive models.

Key Points

  • LLMs can generate portable representations for clinical time series data
  • The approach is simple and easy to use, with competitive performance with existing methods
  • Structured prompts are crucial in reducing predictive model variance

Merits

Strength in Reduced Engineering Overhead

The study's approach reduces the engineering overhead required for deploying predictive models across different hospitals, making it a valuable tool for scalable deployment.

Improved Few-Shot Learning

The use of portable representations improves few-shot learning, enabling the deployment of predictive models with minimal retraining.

Demerits

Limited Generalizability

The study's results may not be generalizable to other clinical domains or settings, highlighting the need for further research to validate the approach.

Dependence on High-Quality Training Data

The success of the approach relies on the availability of high-quality training data, which may not be readily available in all clinical settings.

Expert Commentary

The study's approach is a significant contribution to the field of clinical time series analysis, demonstrating the potential of LLMs to generate portable representations for clinical data. However, the study's limitations, including the limited generalizability of the results and dependence on high-quality training data, highlight the need for further research to validate the approach. Additionally, the study's findings have implications for healthcare policy, highlighting the potential for LLMs to improve healthcare outcomes and reduce costs through scalable deployment of predictive models. As such, the study's results warrant further investigation and consideration in the development of predictive models for healthcare applications.

Recommendations

  • Further research is needed to validate the approach in different clinical domains and settings
  • The development of high-quality training data is essential for the success of the approach

Sources

Original: arXiv - cs.LG