Research & Papers

Formal verification of tree-based machine learning models for lateral spreading

New method uses SMT solvers to prove AI models are physically consistent, catching errors SHAP misses.

Deep Dive

A new research paper by Krishna Kumar establishes a rigorous method for formally verifying the physical consistency of machine learning models used in safety-critical geotechnical applications. The work focuses on tree-based ensembles like XGBoost and Explainable Boosting Machines (EBMs) trained to predict lateral spreading during earthquakes. The key innovation is encoding these trained models as logical formulas and using a Satisfiability Modulo Theories (SMT) solver to exhaustively check them against formalized physical rules—such as monotonic relationships between ground shaking and hazard—across the entire input domain, not just sampled data points.

Applied to a dataset from the 2011 Christchurch earthquake (7,291 sites, four features), the verification revealed critical flaws. An unconstrained EBM model with 80.1% accuracy violated all four physical specifications. By iteratively applying constraints guided by the SMT solver's counterexamples, a fully constrained model was created that satisfied three of the four specs, though its accuracy dropped to 67.2%. A Pareto analysis of 33 model variants showed a persistent trade-off, with none achieving both >80% accuracy and full compliance.

The research demonstrates that post-hoc explanation tools like SHAP are insufficient for safety assurance, as they can miss fundamental physical inconsistencies. This verify-fix-verify engineering loop provides a pathway to formally certify ML models before deployment in high-stakes fields like earthquake engineering, moving beyond approximate diagnostics to provide exhaustive guarantees of model behavior.

Key Points
  • Method uses SMT solvers to exhaustively verify tree-based ML models (XGBoost, EBMs) against formal physical specs, providing guarantees beyond sampling.
  • On Christchurch quake data, an 80.1% accurate model failed all 4 specs; a constrained model (67.2% accuracy) satisfied 3, showing an accuracy-safety trade-off.
  • Proves SHAP/LIME explanations are inadequate for safety, as offending features in counterexamples can rank last in importance, establishing a formal certify-fix loop for critical AI.

Why It Matters

Provides a formal certification framework for deploying physically consistent, trustworthy AI in safety-critical infrastructure and disaster prediction.