Research & Papers

A unified perspective on fine-tuning and sampling with diffusion and flow models

Bias-variance analysis reveals why Adjoint Matching beats standard score matching for reward fine-tuning.

Deep Dive

A new paper from Cornell researchers Carles Domingo-Enrich, Yuanqi Du, and Michael S. Albergo (arXiv:2605.00229) tackles a core problem in generative AI: how to efficiently fine-tune diffusion and flow models to sample from tilted distributions — a setting that covers both sampling from unnormalized densities and reward fine-tuning of pre-trained models. The authors unify stochastic optimal control (SOC) and non-equilibrium thermodynamics perspectives, then derive three major contributions.

The key theoretical insight is a bias-variance decomposition of gradient estimators: Adjoint Matching / Sampling and Novel Score Matching have finite gradient variance, while the commonly used Target and Conditional Score Matching do not — explaining empirical training instability. The paper also provides norm bounds on the lean adjoint ODE that theoretically justify the effectiveness of adjoint-based methods. Practical adaptations of the CMCD and NETS loss functions are introduced alongside novel Crooks and Jarzynski identities for exponential tilting. Experiments on reward fine-tuning Stable Diffusion 1.5 and 3 confirm the theory: methods with finite variance converge faster and more stably.

Key Points
  • Adjoint Matching and Novel Score Matching exhibit finite gradient variance; Target Score Matching does not — explaining instability in diffusion fine-tuning.
  • New norm bounds on the lean adjoint ODE provide theoretical support for adjoint-based sampling methods.
  • Adapted CMCD and NETS loss functions, plus novel Crooks/Jarzynski identities, are validated on Stable Diffusion 1.5 and 3 reward fine-tuning tasks.

Why It Matters

This unified framework could stabilize and accelerate reward fine-tuning for diffusion models — crucial for aligning text-to-image and video models.