NN surrogates miss extreme events: 10x error at distribution tails
Worst-case errors from neural network surrogates can be 10x larger than average—study reveals why.
Neural network surrogate models have become popular for approximating solutions to expensive boundary value problems, especially in stochastic settings where repeated evaluations are needed for parametric analysis. However, most studies focus on deterministic samples or mean fields, ignoring performance at the tails of the distribution. Wade and Teferra's numerical study tackles this gap head-on, using the heat conduction equation with a highly stochastic source term as a canonical test case. They compare a classic feed-forward fully connected network against a Deep Operator Network architecture, training both with data-driven and physics-informed loss functions.
The results are stark: worst-case prediction errors are an order of magnitude larger than mean field errors, highlighting how neural network surrogates can fail to capture extreme events. The large errors stem from the networks having to extrapolate beyond the bounds of the training data—a common but often overlooked pitfall. Among the models tested, the fully connected neural network trained using a weak form residual loss achieved the highest accuracy for the numerically produced datasets. The authors also propose a method for identifying these problematic samples and discuss potential approaches to mitigate such errors, such as adaptive sampling or hybrid models. For engineers and researchers relying on AI-driven surrogates for uncertainty quantification, this study is a crucial reminder that average performance metrics can mask dangerous blind spots.
- Worst-case prediction errors from neural network surrogates are up to 10x larger than mean field errors in stochastic heat conduction problems.
- Fully connected networks trained with a weak form residual loss outperformed Deep Operator Networks and other loss formulations.
- Extrapolation beyond training data limits is the main cause of tail errors; the study presents a method to flag these outlier samples.
Why It Matters
For real-world engineering relying on AI surrogates, missing extreme events can lead to unsafe designs—this study exposes a critical blind spot.