Academic

Evaluating Large Language Models for Gait Classification Using Text-Encoded Kinematic Waveforms

arXiv:2603.13317v1 Announce Type: new Abstract: Background: Machine learning (ML) enhances gait analysis but often lacks the level of interpretability desired for clinical adoption. Large Language Models (LLMs) may offer explanatory capabilities and confidence-aware outputs when applied to structured kinematic data. This study therefore evaluated whether general-purpose LLMs can classify continuous gait kinematics when represented as textual numeric sequences and how their performance compares to conventional ML approaches. Methods: Lower-body kinematics were recorded from 20 participants performing seven gait patterns. A supervised KNN classifier and a class-independent One-Class SVM (OCSVM) were compared against zero-shot LLMs (GPT-5, GPT-5-mini, GPT-4.1, and o4-mini). Models were evaluated using Leave-One-Subject-Out (LOSO) cross-validation. LLMs were tested both with and without explicit reference gait statistics. Results: The supervised KNN achieved the highest performance (multi

arXiv:2603.13317v1 Announce Type: new Abstract: Background: Machine learning (ML) enhances gait analysis but often lacks the level of interpretability desired for clinical adoption. Large Language Models (LLMs) may offer explanatory capabilities and confidence-aware outputs when applied to structured kinematic data. This study therefore evaluated whether general-purpose LLMs can classify continuous gait kinematics when represented as textual numeric sequences and how their performance compares to conventional ML approaches. Methods: Lower-body kinematics were recorded from 20 participants performing seven gait patterns. A supervised KNN classifier and a class-independent One-Class SVM (OCSVM) were compared against zero-shot LLMs (GPT-5, GPT-5-mini, GPT-4.1, and o4-mini). Models were evaluated using Leave-One-Subject-Out (LOSO) cross-validation. LLMs were tested both with and without explicit reference gait statistics. Results: The supervised KNN achieved the highest performance (multiclass Matthews Correlation Coefficient, MCC = 0.88). The best-performing LLM (GPT-5) with reference grounding achieved a multiclass MCC of 0.70 and a binary MCC of 0.68, outperforming the class-independent OCSVM (binary MCC = 0.60). Performance of the LLM was highly dependent on explicit reference information and self-rated confidence; when restricted to high-confidence predictions, multiclass MCC increased to 0.83 on the filtered subset. Notably, the computationally efficient o4-mini model performed comparably to larger models. Conclusion: When continuous kinematic waveforms were encoded as textual numeric tokens, general-purpose LLMs, even with reference grounding, did not match supervised multiclass classifiers for precise gait classification and are better regarded as exploratory systems requiring cautious, human-guided interpretation rather than diagnostic use.

Executive Summary

This study investigates the application of Large Language Models (LLMs) to the classification of gait patterns using text-encoded kinematic waveforms. The results show that while LLMs can achieve reasonable performance, they fall short of supervised multiclass classifiers in precision. The study highlights the dependence of LLM performance on explicit reference information and self-rated confidence. The findings suggest that LLMs are better suited for exploratory purposes and require cautious, human-guided interpretation rather than diagnostic use. The study's results have implications for the development of AI-powered gait analysis tools and the need for further research in this area.

Key Points

  • LLMs can classify gait patterns using text-encoded kinematic waveforms, but with lower precision than supervised multiclass classifiers.
  • Performance of LLMs depends on explicit reference information and self-rated confidence.
  • LLMs are better regarded as exploratory systems requiring cautious, human-guided interpretation rather than diagnostic use.

Merits

Strength in Exploratory Analysis

LLMs can provide novel insights into gait patterns and kinematic waveforms, even if they are not as accurate as traditional classifiers.

Demerits

Limited Diagnostic Use

LLMs are not yet suitable for diagnostic use due to their lower precision compared to supervised multiclass classifiers.

Expert Commentary

This study provides a valuable contribution to the field of AI-powered gait analysis, highlighting the potential of LLMs for exploratory purposes. However, the findings also underscore the need for caution in the application of these models for diagnostic use. To fully realize the potential of AI in healthcare, researchers must prioritize the development of more accurate and interpretable models. This requires a multidisciplinary approach, involving clinicians, researchers, and policymakers. The study's results have significant implications for the development of AI-powered gait analysis tools and the need for further research in this area.

Recommendations

  • Future research should prioritize the development of more accurate and interpretable AI models for gait analysis and other medical applications.
  • Healthcare policymakers should provide funding for research on the development of AI-powered gait analysis tools and the evaluation of their clinical efficacy.

Sources