Collaborative Temporal Feature Generation via Critic-Free Reinforcement Learning for Cross-User Sensor-Based Activity Recognition
arXiv:2603.16043v1 Announce Type: new Abstract: Human Activity Recognition using wearable inertial sensors is foundational to healthcare monitoring, fitness analytics, and context-aware computing, yet its deployment is hindered by cross-user variability arising from heterogeneous physiological traits, motor habits, and sensor placements. Existing domain generalization approaches either neglect temporal dependencies in sensor streams or depend on impractical target-domain annotations. We propose a different paradigm: modeling generalizable feature extraction as a collaborative sequential generation process governed by reinforcement learning. Our framework, CTFG (Collaborative Temporal Feature Generation), employs a Transformer-based autoregressive generator that incrementally constructs feature token sequences, each conditioned on prior context and the encoded sensor input. The generator is optimized via Group-Relative Policy Optimization, a critic-free algorithm that evaluates each ge
arXiv:2603.16043v1 Announce Type: new Abstract: Human Activity Recognition using wearable inertial sensors is foundational to healthcare monitoring, fitness analytics, and context-aware computing, yet its deployment is hindered by cross-user variability arising from heterogeneous physiological traits, motor habits, and sensor placements. Existing domain generalization approaches either neglect temporal dependencies in sensor streams or depend on impractical target-domain annotations. We propose a different paradigm: modeling generalizable feature extraction as a collaborative sequential generation process governed by reinforcement learning. Our framework, CTFG (Collaborative Temporal Feature Generation), employs a Transformer-based autoregressive generator that incrementally constructs feature token sequences, each conditioned on prior context and the encoded sensor input. The generator is optimized via Group-Relative Policy Optimization, a critic-free algorithm that evaluates each generated sequence against a cohort of alternatives sampled from the same input, deriving advantages through intra-group normalization rather than learned value estimation. This design eliminates the distribution-dependent bias inherent in critic-based methods and provides self-calibrating optimization signals that remain stable across heterogeneous user distributions. A tri-objective reward comprising class discrimination, cross-user invariance, and temporal fidelity jointly shapes the feature space to separate activities, align user distributions, and preserve fine-grained temporal content. Evaluations on the DSADS and PAMAP2 benchmarks demonstrate state-of-the-art cross-user accuracy (88.53\% and 75.22\%), substantial reduction in inter-task training variance, accelerated convergence, and robust generalization under varying action-space dimensionalities.
Executive Summary
This article proposes a novel approach to human activity recognition utilizing wearable inertial sensors, addressing the challenge of cross-user variability through a critic-free reinforcement learning paradigm. The CTFG framework leverages a Transformer-based generator to collaboratively construct feature token sequences, optimized via Group-Relative Policy Optimization. The proposed method demonstrates state-of-the-art cross-user accuracy on benchmark datasets, substantial reduction in inter-task training variance, accelerated convergence, and robust generalization under varying action-space dimensionalities.
Key Points
- ▸ The CTFG framework employs a Transformer-based autoregressive generator for collaborative temporal feature generation.
- ▸ Group-Relative Policy Optimization is used to optimize the generator without relying on learned value estimation.
- ▸ A tri-objective reward is designed to jointly shape the feature space for class discrimination, cross-user invariance, and temporal fidelity.
Merits
Addressing cross-user variability
The proposed method effectively addresses the challenge of cross-user variability in human activity recognition, enabling robust generalization across heterogeneous user distributions.
Critic-free reinforcement learning
The use of Group-Relative Policy Optimization eliminates the distribution-dependent bias inherent in critic-based methods, providing self-calibrating optimization signals.
Improved performance on benchmark datasets
The CTFG framework achieves state-of-the-art cross-user accuracy on the DSADS and PAMAP2 benchmarks, demonstrating its effectiveness in real-world applications.
Demerits
Complexity of the proposed framework
The CTFG framework relies on advanced deep learning architectures, such as the Transformer, which may increase computational complexity and require significant resources for implementation.
Limited evaluation on user diversity
The article primarily evaluates the proposed method on benchmark datasets, which may not fully represent the diversity of user populations in real-world applications.
Expert Commentary
The proposed method represents a significant advancement in the field of human activity recognition, addressing the challenge of cross-user variability through a novel critic-free reinforcement learning paradigm. The use of Group-Relative Policy Optimization and the Transformer-based generator are innovative and effective approaches to addressing the limitations of existing methods. However, the complexity of the proposed framework and the limited evaluation on user diversity are potential limitations that need to be addressed in future research. Overall, the proposed method has the potential to improve the accuracy and robustness of human activity recognition systems in real-world applications, and its implications for healthcare outcomes and personalized health monitoring systems are promising.
Recommendations
- ✓ Future research should focus on evaluating the proposed method on more diverse user populations and exploring its potential applications in real-world scenarios.
- ✓ The authors should provide more details on the implementation and computational complexity of the proposed framework, as well as its scalability for large-scale applications.