Resolving gradient pathology in physics-informed epidemiological models
arXiv:2603.23799v1 Announce Type: new Abstract: Physics-informed neural networks (PINNs) are increasingly used in mathematical epidemiology to bridge the gap between noisy clinical data and compartmental models, such as the susceptible-exposed-infected-removed (SEIR) model. However, training these hybrid networks is often unstable due to competing optimization objectives. As established in recent literature on ``gradient pathology," the gradient vectors derived from the data loss and the physical residual often point in conflicting directions, leading to slow convergence or optimization deadlock. While existing methods attempt to resolve this by balancing gradient magnitudes or projecting conflicting vectors, we propose a novel method, conflict-gated gradient scaling (CGGS), to address gradient conflicts in physics-informed neural networks for epidemiological modelling, ensuring stable and efficient training and a computationally efficient alternative. This method utilizes the cosine
arXiv:2603.23799v1 Announce Type: new Abstract: Physics-informed neural networks (PINNs) are increasingly used in mathematical epidemiology to bridge the gap between noisy clinical data and compartmental models, such as the susceptible-exposed-infected-removed (SEIR) model. However, training these hybrid networks is often unstable due to competing optimization objectives. As established in recent literature on ``gradient pathology," the gradient vectors derived from the data loss and the physical residual often point in conflicting directions, leading to slow convergence or optimization deadlock. While existing methods attempt to resolve this by balancing gradient magnitudes or projecting conflicting vectors, we propose a novel method, conflict-gated gradient scaling (CGGS), to address gradient conflicts in physics-informed neural networks for epidemiological modelling, ensuring stable and efficient training and a computationally efficient alternative. This method utilizes the cosine similarity between the data and physics gradients to dynamically modulate the penalty weight. Unlike standard annealing schemes that only normalize scales, CGGS acts as a geometric gate: it suppresses the physical constraint when directional conflict is high, allowing the optimizer to prioritize data fidelity, and restores the constraint when gradients align. We prove that this gating mechanism preserves the standard $O(1/T)$ convergence rate for smooth non-convex objectives, a guarantee that fails under fixed-weight or magnitude-balanced training when gradients conflict. We demonstrate that this mechanism autonomously induces a curriculum learning effect, improving parameter estimation in stiff epidemiological systems compared to magnitude-based baselines. Our empirical results show improved peak recovery and convergence over magnitude-based methods.
Executive Summary
This study proposes a novel method, conflict-gated gradient scaling (CGGS), to address gradient conflicts in physics-informed neural networks for epidemiological modeling. CGGS dynamically modulates the penalty weight based on the cosine similarity between data and physics gradients, allowing the optimizer to prioritize data fidelity when gradients conflict. Empirical results demonstrate improved peak recovery and convergence compared to magnitude-based methods. The study proves that CGGS preserves the standard O(1/T) convergence rate for smooth non-convex objectives, providing a guarantee that fails under fixed-weight or magnitude-balanced training. The proposed method autonomously induces a curriculum learning effect, improving parameter estimation in stiff epidemiological systems. The findings have significant implications for the development of physics-informed neural networks in mathematical epidemiology.
Key Points
- ▸ Conflict-gated gradient scaling (CGGS) addresses gradient conflicts in physics-informed neural networks.
- ▸ CGGS preserves the standard O(1/T) convergence rate for smooth non-convex objectives.
- ▸ CGGS autonomously induces a curriculum learning effect, improving parameter estimation in stiff epidemiological systems.
Merits
Preservation of Convergence Rate
CGGS preserves the standard O(1/T) convergence rate for smooth non-convex objectives, providing a guarantee that fails under fixed-weight or magnitude-balanced training.
Improved Parameter Estimation
CGGS autonomously induces a curriculum learning effect, improving parameter estimation in stiff epidemiological systems.
Improved Convergence and Recovery
Empirical results demonstrate improved peak recovery and convergence compared to magnitude-based methods.
Demerits
Limited Evaluation of Robustness
The study does not evaluate the robustness of CGGS to various types of noise and outliers in the data.
Assumption of Smooth Non-Convex Objectives
The study assumes that the objectives are smooth and non-convex, which may not be the case in all epidemiological modeling scenarios.
Expert Commentary
The study proposes a novel method, CGGS, to address gradient conflicts in physics-informed neural networks for epidemiological modeling. The empirical results demonstrate improved peak recovery and convergence compared to magnitude-based methods, which is a significant contribution to the field. However, the study assumes that the objectives are smooth and non-convex, which may not be the case in all epidemiological modeling scenarios. Furthermore, the study does not evaluate the robustness of CGGS to various types of noise and outliers in the data. Nevertheless, the findings of this study have significant implications for the development of physics-informed neural networks in mathematical epidemiology.
Recommendations
- ✓ Future studies should evaluate the robustness of CGGS to various types of noise and outliers in the data.
- ✓ Researchers should investigate the application of CGGS in other areas of mathematical epidemiology, such as disease spread modeling and vaccination strategy evaluation.
Sources
Original: arXiv - cs.LG