Adversarial Vulnerabilities in Neural Operator Digital Twins: Gradient-Free Attacks on Nuclear Thermal-Hydraulic Surrogates
arXiv:2603.22525v1 Announce Type: new Abstract: Operator learning models are rapidly emerging as the predictive core of digital twins for nuclear and energy systems, promising real-time field reconstruction from sparse sensor measurements. Yet their robustness to adversarial perturbations remains uncharacterized, a critical gap for deployment in safety-critical systems. Here we show that neural operators are acutely vulnerable to extremely sparse (fewer than 1% of inputs), physically plausible perturbations that exploit their sensitivity to boundary conditions. Using gradient-free differential evolution across four operator architectures, we demonstrate that minimal modifications trigger catastrophic prediction failures, increasing relative $L_2$ error from $\sim$1.5% (validated accuracy) to 37-63% while remaining completely undetectable by standard validation metrics. Notably, 100% of successful single-point attacks pass z-score anomaly detection. We introduce the effective perturbat
arXiv:2603.22525v1 Announce Type: new Abstract: Operator learning models are rapidly emerging as the predictive core of digital twins for nuclear and energy systems, promising real-time field reconstruction from sparse sensor measurements. Yet their robustness to adversarial perturbations remains uncharacterized, a critical gap for deployment in safety-critical systems. Here we show that neural operators are acutely vulnerable to extremely sparse (fewer than 1% of inputs), physically plausible perturbations that exploit their sensitivity to boundary conditions. Using gradient-free differential evolution across four operator architectures, we demonstrate that minimal modifications trigger catastrophic prediction failures, increasing relative $L_2$ error from $\sim$1.5% (validated accuracy) to 37-63% while remaining completely undetectable by standard validation metrics. Notably, 100% of successful single-point attacks pass z-score anomaly detection. We introduce the effective perturbation dimension $d_{\text{eff}}$, a Jacobian-based diagnostic that, together with sensitivity magnitude, yields a two-factor vulnerability model explaining why architectures with extreme sensitivity concentration (POD-DeepONet, $d_{\text{eff}} \approx 1$) are not necessarily the most exploitable, since low-rank output projections cap maximum error, while moderate concentration with sufficient amplification (S-DeepONet, $d_{\text{eff}} \approx 4$) produces the highest attack success. Gradient-free search outperforms gradient-based alternatives (PGD) on architectures with gradient pathologies, while random perturbations of equal magnitude achieve near-zero success rates, confirming that the discovered vulnerabilities are structural. Our findings expose a previously overlooked attack surface in operator learning models and establish that these models require robustness guarantees beyond standard validation before deployment.
Executive Summary
This article sheds light on the hitherto uncharacterized vulnerabilities of neural operator digital twins in nuclear and energy systems to adversarial perturbations. The authors demonstrate the efficacy of gradient-free attacks on four operator architectures, revealing that minimal modifications can lead to catastrophic prediction failures, undetectable by standard validation metrics. The introduction of the effective perturbation dimension, a diagnostic tool, explains the vulnerability of these models and provides a two-factor vulnerability model. The findings expose a significant attack surface in operator learning models and underscore the need for robustness guarantees beyond standard validation before deployment.
Key Points
- ▸ Neural operator digital twins are vulnerable to adversarial perturbations, posing a critical risk to safety-critical systems.
- ▸ Gradient-free attacks can lead to catastrophic prediction failures, undetectable by standard validation metrics.
- ▸ The effective perturbation dimension is a useful diagnostic tool for understanding the vulnerability of operator learning models.
Merits
Strength in Methodology
The authors employ a robust and systematic approach, using gradient-free differential evolution to demonstrate the efficacy of adversarial attacks on operator learning models. This methodology provides a comprehensive understanding of the vulnerabilities in these models.
Insightful Diagnostic Tool
The introduction of the effective perturbation dimension offers a valuable diagnostic tool for understanding the vulnerability of operator learning models and provides a two-factor vulnerability model that explains why certain architectures are more exploitable than others.
Demerits
Limitation in Generalizability
The study is limited to four operator architectures, and the authors do not provide a comprehensive analysis of the vulnerabilities in other types of neural networks. Future studies should aim to generalize these findings to a broader range of models.
Need for Further Research
The article highlights the need for robustness guarantees in operator learning models, but it does not provide a clear roadmap for achieving these guarantees. Further research is needed to develop effective defenses against adversarial attacks on these models.
Expert Commentary
The article makes a significant contribution to the field of artificial intelligence and machine learning, highlighting the vulnerabilities of neural operator digital twins to adversarial perturbations. The authors' use of gradient-free attacks and the introduction of the effective perturbation dimension offer a comprehensive understanding of the vulnerabilities in these models. However, the study is limited to four operator architectures, and further research is needed to generalize these findings to a broader range of models. The implications of this research are far-reaching, with significant practical and policy implications for the deployment of operator learning models in safety-critical systems.
Recommendations
- ✓ Future studies should aim to generalize the findings of this research to a broader range of neural network architectures and applications.
- ✓ Developing effective defenses against adversarial attacks on operator learning models is crucial for ensuring the robustness and security of these systems.
Sources
Original: arXiv - cs.LG