Research & Papers

TERA slashes derivative GP compute from O(n³d³) to O(d m² + m⁶)

New method keeps GPU memory flat as dimensions soar, enabling trillion-parameter surrogates.

Deep Dive

Derivative Gaussian processes (GPs) are powerful for high-dimensional surrogate modeling, but exact inference with n function values and n full gradients in d dimensions scales as O(n³d³) — intractable beyond toy problems. A new paper from Hyunseok Seung and Matthias Katzfuss introduces TERA (Target-specific Exact Gradient Reduction), which exploits a mathematical insight: for stationary kernels, gradient components orthogonal to the direction connecting a target and conditioning point are conditionally independent of the target value. This means the exact conditional density depends only on at most m² directional derivatives once a conditioning set of size m is chosen.

By plugging these reduced, dimension-free conditionals into a Vecchia approximation, TERA decouples n and d from dense matrix inversion. The per-target evaluation cost drops to O(d m² + m⁶) time and O(d m² + m⁴) memory — orders of magnitude faster than standard derivative GPs. Crucially, both runtime and peak GPU memory remain essentially constant as d grows, making TERA ideal for high-dimensional problems like climate modeling, molecular dynamics, or engineering design. Empirical tests show state-of-the-art predictive accuracy with dramatically lower compute.

Key Points
  • TERA reduces derivative GP inference from O(n³d³) to O(d m² + m⁶) time and O(d m² + m⁴) memory.
  • Mathematical proof shows gradient components orthogonal to conditioning directions are conditionally independent, enabling exact gradient reduction.
  • GPU memory and compute stay essentially flat with increasing dimensionality d, enabling scalable high-dimensional surrogate modeling.

Why It Matters

Enables real-time derivative GP surrogates for high-dimensional problems that were previously computationally infeasible.