Academic

RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach

Yifan Zhang, Liang Zheng · March 20, 2026 · 1 min read · 8 views

#cs.LG #cs.RO

arXiv:2603.18396v1 Announce Type: new Abstract: Bus holding control is challenging due to stochastic traffic and passenger demand. While deep reinforcement learning (DRL) shows promise, standard actor-critic algorithms suffer from Q-value instability in volatile environments. A key source of this instability is the conflation of two distinct uncertainties: aleatoric uncertainty (irreducible noise) and epistemic uncertainty (data insufficiency). Treating these as a single risk leads to value underestimation in noisy states, causing catastrophic policy collapse. We propose a robust ensemble soft actor-critic (RE-SAC) framework to explicitly disentangle these uncertainties. RE-SAC applies Integral Probability Metric (IPM)-based weight regularization to the critic network to hedge against aleatoric risk, providing a smooth analytical lower bound for the robust Bellman operator without expensive inner-loop perturbations. To address epistemic risk, a diversified Q-ensemble penalizes overconfident value estimates in sparsely covered regions. This dual mechanism prevents the ensemble variance from misidentifying noise as a data gap, a failure mode identified in our ablation study. Experiments in a realistic bidirectional bus corridor simulation demonstrate that RE-SAC achieves the highest cumulative reward (approx. -0.4e6) compared to vanilla SAC (-0.55e6). Mahalanobis rareness analysis confirms that RE-SAC reduces Oracle Q-value estimation error by up to 62% in rare out-of-distribution states (MAE of 1647 vs. 4343), demonstrating superior robustness under high traffic variability.

Executive Summary

This article presents a novel deep reinforcement learning (DRL) approach, RE-SAC, to address the challenges of bus fleet control in stochastic traffic and passenger demand environments. The method disentangles aleatoric and epistemic uncertainties using ensemble-based techniques. RE-SAC achieves superior performance in a realistic simulation, demonstrating improved robustness and lower value estimation error. This work has significant implications for transportation management systems, where accurate decision-making is crucial. The proposed approach can be applied to other high-stakes, stochastic environments, such as autonomous driving and healthcare.

Key Points

▸ The article introduces RE-SAC, a robust ensemble DRL approach to disentangle aleatoric and epistemic uncertainties in bus fleet control.
▸ RE-SAC applies IPM-based weight regularization to hedge against aleatoric risk and a diversified Q-ensemble to address epistemic risk.
▸ Experiments demonstrate superior performance of RE-SAC in a realistic simulation, with improved robustness and lower value estimation error.

Merits

Robustness to Stochastic Environments

RE-SAC successfully addresses the challenges of stochastic traffic and passenger demand, achieving superior performance in a realistic simulation.

Improved Value Estimation

The proposed approach provides a smooth analytical lower bound for the robust Bellman operator, reducing value estimation error and preventing catastrophic policy collapse.

Ensemble-Based Uncertainty Disentanglement

RE-SAC's dual mechanism effectively disentangles aleatoric and epistemic uncertainties, preventing the ensemble variance from misidentifying noise as a data gap.

Demerits

Complexity of the Proposed Approach

The RE-SAC framework may be computationally intensive due to the use of ensemble-based techniques and IPM-based weight regularization.

Limited Generalizability to Other Environments

While the article demonstrates the effectiveness of RE-SAC in a specific bus fleet control scenario, its applicability to other high-stakes, stochastic environments requires further investigation.

Expert Commentary

The proposed RE-SAC framework is a significant contribution to the field of DRL, addressing the challenges of stochastic environments and uncertainty quantification. While the complexity of the approach may limit its applicability to certain scenarios, the article's findings demonstrate the potential of RE-SAC to improve decision-making in high-stakes environments. Future research should focus on exploring the generalizability of RE-SAC to other domains and investigating its applicability to more complex stochastic environments.

Recommendations

✓ Future research should focus on exploring the generalizability of RE-SAC to other domains and investigating its applicability to more complex stochastic environments.
✓ The proposed approach should be further validated in real-world scenarios, with a focus on evaluating its robustness and performance in various transportation management contexts.

Sources

arXiv - cs.LG

RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach

AI Commentary

Executive Summary

Key Points

Merits

Robustness to Stochastic Environments

Improved Value Estimation

Ensemble-Based Uncertainty Disentanglement

Demerits

Complexity of the Proposed Approach

Limited Generalizability to Other Environments

Expert Commentary

Recommendations

Sources

Related Articles

AI-Driven Approaches to Enhancing Fairness and Identifying Algorithmic Bias in …

High resolution schemes for hyperbolic conservation laws

Robust Graph Representation Learning via Adaptive Spectral Contrast

Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via …

JCG, PC

HSOLLC Co., Ltd.