StAD method speeds diffusion model likelihoods without Jacobian computation
New distillation technique cuts likelihood estimation cost from O(D²) to O(D) with less noise.
A new paper by Gurjeet Jagwani, Stephen Thorp, Sinan Deger, and Hiranya Peiris introduces StAD (Stein Amortized Divergence), a distillation method that accelerates likelihood computation for diffusion and flow-based generative models. These models, widely used for density estimation and Bayesian analysis, rely on a probability flow ODE (PF-ODE) to transport probability mass. Computing likelihoods requires the trace of the Jacobian (divergence) of the learned vector field, which is either O(D²) exact or O(D) with a noisy estimate (e.g., Hutchinson's trace estimator). StAD avoids computing the Jacobian entirely by training a neural network to predict the divergence using the Langevin-Stein operator, enabling amortized inference that is both faster and less noisy.
Experiments on CIFAR-10 and ImageNet show StAD consistently improves variance and speed over the Hutchinson estimator, and competes with the more advanced Hutch++ method. The authors also prove that under regularity conditions, the learned vector fields satisfy the Stein class, ensuring theoretical soundness. The method generalizes across various generative models (diffusion, flow, continuous normalizing flows) and is particularly valuable for workflows requiring repeated likelihood evaluations, such as Bayesian posterior sampling or model comparison. StAD's efficiency could make diffusion-based density estimation more practical for high-dimensional scientific applications, from astrophysics to machine learning.
- StAD uses the Langevin-Stein operator to predict divergence of PF-ODE without Jacobian computation.
- Reduces likelihood estimation cost to O(D) with lower variance than the standard Hutchinson estimator.
- Tested on CIFAR-10 and ImageNet, matching or outperforming Hutch++ on variance and speed.
Why It Matters
Enables fast, accurate likelihoods from diffusion models, boosting Bayesian analysis and density estimation for high-dimensional data.