Ensemble Graph Neural Networks for Probabilistic Sea Surface Temperature Forecasting via Input Perturbations
arXiv:2603.06153v1 Announce Type: new Abstract: Accurate regional ocean forecasting requires models that are both computationally efficient and capable of representing predictive uncertainty. This work investigates ensemble learning strategies for sea surface temperature (SST) forecasting using Graph Neural Networks (GNNs), with a focus on how input perturbation design affects forecast skill and uncertainty representation. We adapt a GNN architecture to the Canary Islands region in the North Atlantic and implement a homogeneous ensemble approach inspired by bagging, where diversity is introduced during inference by perturbing initial ocean states rather than retraining multiple models. Several noise-based ensemble generation strategies are evaluated, including Gaussian noise, Perlin noise, and fractal Perlin noise, with systematic variation of noise intensity and spatial structure. Ensemble forecasts are assessed over a 15-day horizon using deterministic metrics (RMSE and bias) and pr
arXiv:2603.06153v1 Announce Type: new Abstract: Accurate regional ocean forecasting requires models that are both computationally efficient and capable of representing predictive uncertainty. This work investigates ensemble learning strategies for sea surface temperature (SST) forecasting using Graph Neural Networks (GNNs), with a focus on how input perturbation design affects forecast skill and uncertainty representation. We adapt a GNN architecture to the Canary Islands region in the North Atlantic and implement a homogeneous ensemble approach inspired by bagging, where diversity is introduced during inference by perturbing initial ocean states rather than retraining multiple models. Several noise-based ensemble generation strategies are evaluated, including Gaussian noise, Perlin noise, and fractal Perlin noise, with systematic variation of noise intensity and spatial structure. Ensemble forecasts are assessed over a 15-day horizon using deterministic metrics (RMSE and bias) and probabilistic metrics, including the Continuous Ranked Probability Score (CRPS) and the Spread-skill ratio. Results show that, while deterministic skill remains comparable to the single-model forecast, the type and structure of input perturbations strongly influence uncertainty representation, particularly at longer lead times. Ensembles generated with spatially coherent perturbations, such as low-resolution Perlin noise, achieve better calibration and lower CRPS than purely random Gaussian perturbations. These findings highlight the critical role of noise structure and scale in ensemble GNN design and demonstrate that carefully constructed input perturbations can yield well-calibrated probabilistic forecasts without additional training cost, supporting the feasibility of ensemble GNNs for operational regional ocean prediction.
Executive Summary
This study explores the application of ensemble Graph Neural Networks (GNNs) for probabilistic sea surface temperature (SST) forecasting, focusing on the impact of input perturbations on forecast skill and uncertainty representation. The authors adapt a GNN architecture to the Canary Islands region and implement a homogeneous ensemble approach inspired by bagging. Several noise-based ensemble generation strategies are evaluated, showcasing the importance of noise structure and scale in achieving well-calibrated probabilistic forecasts. The study demonstrates the feasibility of ensemble GNNs for operational regional ocean prediction, offering a computationally efficient solution for accurately representing predictive uncertainty in regional ocean forecasting. The findings have significant implications for the development of more accurate and reliable ocean prediction models.
Key Points
- ▸ Ensemble Graph Neural Networks (GNNs) are proposed for probabilistic sea surface temperature (SST) forecasting.
- ▸ Input perturbations significantly influence uncertainty representation, particularly at longer lead times.
- ▸ Spatially coherent perturbations, such as low-resolution Perlin noise, achieve better calibration and lower CRPS than random Gaussian perturbations.
Merits
Strength in Ensemble Design
The study presents a novel ensemble GNN design that effectively incorporates input perturbations to enhance forecast skill and uncertainty representation.
Methodological Contributions
The authors provide a thorough evaluation of different noise-based ensemble generation strategies, highlighting the importance of noise structure and scale in achieving well-calibrated probabilistic forecasts.
Demerits
Limited Spatial Coverage
The study focuses on the Canary Islands region, limiting the generalizability of the findings to other regions with different oceanographic characteristics.
Lack of Comparison with Other Ensemble Methods
The study does not provide a comprehensive comparison with other ensemble methods, such as bagging or boosting, which may have implications for the interpretation of the results.
Expert Commentary
The study presents a well-designed experiment that effectively explores the impact of input perturbations on ensemble GNNs for probabilistic SST forecasting. The authors' use of a homogeneous ensemble approach inspired by bagging is a novel contribution to the field, and their evaluation of different noise-based ensemble generation strategies provides valuable insights into the importance of noise structure and scale in achieving well-calibrated probabilistic forecasts. However, the study's limitations, such as the limited spatial coverage and lack of comparison with other ensemble methods, should be acknowledged. Nevertheless, the study's findings have significant implications for the development of more accurate and reliable ocean prediction models, and its contributions to the field of ensemble learning strategies are noteworthy.
Recommendations
- ✓ Future studies should aim to generalize the findings to other regions with different oceanographic characteristics.
- ✓ A comprehensive comparison with other ensemble methods, such as bagging or boosting, should be conducted to provide a more comprehensive understanding of the results.