TERA reduces derivative GP inference from O(n³d³) to O(d m² + m⁶) time and O(d m² + m⁴) memory?

TERA reduces derivative GP inference from O(n³d³) to O(d m² + m⁶) time and O(d m² + m⁴) memory.

Mathematical proof shows gradient components orthogonal to conditioning directions are conditionally independent, enabling exact gradient reduction?

Mathematical proof shows gradient components orthogonal to conditioning directions are conditionally independent, enabling exact gradient reduction.

GPU memory and compute stay essentially flat with increasing dimensionality d, enabling scalable high-dimensional surrogate modeling?

GPU memory and compute stay essentially flat with increasing dimensionality d, enabling scalable high-dimensional surrogate modeling.

Research & Papers

TERA slashes derivative GP compute from O(n³d³) to O(d m² + m⁶)

arXiv stat.ML June 03, 2026

⚡New method keeps GPU memory flat as dimensions soar, enabling trillion-parameter surrogates.

Deep Dive

Derivative Gaussian processes (GPs) are powerful for high-dimensional surrogate modeling, but exact inference with n function values and n full gradients in d dimensions scales as O(n³d³) — intractable beyond toy problems. A new paper from Hyunseok Seung and Matthias Katzfuss introduces TERA (Target-specific Exact Gradient Reduction), which exploits a mathematical insight: for stationary kernels, gradient components orthogonal to the direction connecting a target and conditioning point are conditionally independent of the target value. This means the exact conditional density depends only on at most m² directional derivatives once a conditioning set of size m is chosen.

By plugging these reduced, dimension-free conditionals into a Vecchia approximation, TERA decouples n and d from dense matrix inversion. The per-target evaluation cost drops to O(d m² + m⁶) time and O(d m² + m⁴) memory — orders of magnitude faster than standard derivative GPs. Crucially, both runtime and peak GPU memory remain essentially constant as d grows, making TERA ideal for high-dimensional problems like climate modeling, molecular dynamics, or engineering design. Empirical tests show state-of-the-art predictive accuracy with dramatically lower compute.

Key Points

TERA reduces derivative GP inference from O(n³d³) to O(d m² + m⁶) time and O(d m² + m⁴) memory.
Mathematical proof shows gradient components orthogonal to conditioning directions are conditionally independent, enabling exact gradient reduction.
GPU memory and compute stay essentially flat with increasing dimensionality d, enabling scalable high-dimensional surrogate modeling.

Why It Matters

Enables real-time derivative GP surrogates for high-dimensional problems that were previously computationally infeasible.

Read Original Article

TERA slashes derivative GP compute from O(n³d³) to O(d m² + m⁶)

Why It Matters

Related Articles

🚀 Stay Ahead in AI