Academic

Formal verification of tree-based machine learning models for lateral spreading

arXiv:2603.16983v1 Announce Type: new Abstract: Machine learning models for geotechnical hazard prediction can achieve high accuracy while learning physically inconsistent relationships from sparse or biased training data. Current remedies (post-hoc explainability, such as SHAP and LIME, and training-time constraints) either diagnose individual predictions approximately or restrict model capacity without providing exhaustive guarantees. This paper encodes trained tree ensembles as logical formulas in a Satisfiability Modulo Theories (SMT) solver and checks physical specifications across the entire input domain, not just sampled points. Four geotechnical specifications (water table depth, PGA monotonicity, distance safety, and flat-ground safety) are formalized as decidable logical formulas and verified via SMT against both XGBoost ensembles and Explainable Boosting Machines (EBMs) trained on the 2011 Christchurch earthquake lateral spreading dataset (7,291 sites, four features). The S

K
Krishna Kumar
· · 1 min read · 16 views

arXiv:2603.16983v1 Announce Type: new Abstract: Machine learning models for geotechnical hazard prediction can achieve high accuracy while learning physically inconsistent relationships from sparse or biased training data. Current remedies (post-hoc explainability, such as SHAP and LIME, and training-time constraints) either diagnose individual predictions approximately or restrict model capacity without providing exhaustive guarantees. This paper encodes trained tree ensembles as logical formulas in a Satisfiability Modulo Theories (SMT) solver and checks physical specifications across the entire input domain, not just sampled points. Four geotechnical specifications (water table depth, PGA monotonicity, distance safety, and flat-ground safety) are formalized as decidable logical formulas and verified via SMT against both XGBoost ensembles and Explainable Boosting Machines (EBMs) trained on the 2011 Christchurch earthquake lateral spreading dataset (7,291 sites, four features). The SMT solver either produces a concrete counterexample where a specification fails or proves that no violation exists. The unconstrained EBM (80.1% accuracy) violates all four specifications. A fully constrained EBM (67.2%) satisfies three of four specifications, demonstrating that iterative constraint application guided by verification can progressively improve physical consistency. A Pareto analysis of 33 model variants reveals a persistent trade-off, as none of the variants studied achieve both greater than 80% accuracy and full compliance with the specified set. SHAP analysis of specification counterexamples shows that the offending feature can rank last, demonstrating that post-hoc explanations do not substitute for formal verification. These results establish a verify-fix-verify engineering loop and a formal certification for deploying physically consistent ML models in safety-critical geotechnical applications.

Executive Summary

This article innovatively employs formal verification to ensure physical consistency in machine learning (ML) models for geotechnical hazard prediction. By encoding trained tree ensembles as logical formulas and checking against Satisfiability Modulo Theories (SMT) solvers, the authors demonstrate that current remedies for inconsistent ML models, such as post-hoc explainability and training-time constraints, are insufficient. The study highlights a trade-off between accuracy and physical consistency, underscoring the need for iterative constraint application guided by verification. The results establish a verify-fix-verify engineering loop and formal certification for deploying physically consistent ML models in safety-critical applications.

Key Points

  • The authors use formal verification to ensure physical consistency in ML models for geotechnical hazard prediction.
  • Current remedies for inconsistent ML models are insufficient and cannot provide exhaustive guarantees.
  • A verify-fix-verify engineering loop is established to progressively improve physical consistency in ML models.

Merits

Innovative Application of Formal Verification

The authors successfully apply formal verification to ensure physical consistency in ML models, addressing a critical limitation in current remedies.

Comprehensive Evaluation of ML Models

The study comprehensively evaluates ML models, including tree ensembles and Explainable Boosting Machines (EBMs), to assess their physical consistency.

Establishment of Verify-Fix-Verify Engineering Loop

The authors establish a verify-fix-verify engineering loop, providing a systematic approach to ensuring physical consistency in ML models.

Demerits

Limited Generalizability to Other Domains

The study focuses on geotechnical hazard prediction and may not be directly generalizable to other domains or applications.

High Computational Complexity

The use of SMT solvers may introduce high computational complexity, limiting the scalability of the approach.

Expert Commentary

The article makes a significant contribution to the field of ML, particularly in the context of geotechnical hazard prediction. The innovative application of formal verification and the establishment of a verify-fix-verify engineering loop provide a systematic approach to ensuring physical consistency in ML models. However, the study's limitations, including limited generalizability and high computational complexity, must be addressed in future research. Moreover, the policy implications of this study are substantial, and further research is needed to explore the feasibility of formal certification processes for physically consistent ML models.

Recommendations

  • Future research should focus on extending the application of formal verification to other domains and ML models.
  • Researchers should explore methods to reduce the computational complexity of SMT solvers and improve scalability.

Sources