TERA slashes derivative GP compute from O(n³d³) to O(d m² + m⁶)
New method keeps GPU memory flat as dimensions soar, enabling trillion-parameter surrogates.
Derivative Gaussian processes (GPs) are powerful for high-dimensional surrogate modeling, but exact inference with n function values and n full gradients in d dimensions scales as O(n³d³) — intractable beyond toy problems. A new paper from Hyunseok Seung and Matthias Katzfuss introduces TERA (Target-specific Exact Gradient Reduction), which exploits a mathematical insight: for stationary kernels, gradient components orthogonal to the direction connecting a target and conditioning point are conditionally independent of the target value. This means the exact conditional density depends only on at most m² directional derivatives once a conditioning set of size m is chosen.
By plugging these reduced, dimension-free conditionals into a Vecchia approximation, TERA decouples n and d from dense matrix inversion. The per-target evaluation cost drops to O(d m² + m⁶) time and O(d m² + m⁴) memory — orders of magnitude faster than standard derivative GPs. Crucially, both runtime and peak GPU memory remain essentially constant as d grows, making TERA ideal for high-dimensional problems like climate modeling, molecular dynamics, or engineering design. Empirical tests show state-of-the-art predictive accuracy with dramatically lower compute.
- TERA reduces derivative GP inference from O(n³d³) to O(d m² + m⁶) time and O(d m² + m⁴) memory.
- Mathematical proof shows gradient components orthogonal to conditioning directions are conditionally independent, enabling exact gradient reduction.
- GPU memory and compute stay essentially flat with increasing dimensionality d, enabling scalable high-dimensional surrogate modeling.
Why It Matters
Enables real-time derivative GP surrogates for high-dimensional problems that were previously computationally infeasible.