Q-Drift: Quantization-Aware Drift Correction for Diffusion Model Sampling
New method reduces image quality degradation from model compression by up to 4.59 FID points.
A team of researchers has introduced Q-Drift, a novel solution to a critical problem in deploying large AI image generators: post-training quantization (PTQ). While PTQ compresses models like Stable Diffusion or PixArt-Sigma for efficient deployment on consumer hardware, the process introduces quantization noise. This noise accumulates over the many denoising steps of diffusion model sampling, leading to degraded, blurry, or artifact-ridden final images. Q-Drift tackles this by mathematically modeling the quantization error as an implicit stochastic perturbation and deriving a principled, sampler-side correction that preserves the target image distribution.
The breakthrough of Q-Drift is its practicality. It requires minimal calibration—just 5 paired runs comparing full-precision and quantized model outputs—to estimate a timestep-wise variance statistic. This data is used to calculate a marginal-distribution-preserving drift adjustment. The correction is plug-and-play, compatible with common samplers (Euler, DPM-Solver++), architectures (U-Net, DiT), and PTQ methods like SVDQuant and MixDQ. In tests across six text-to-image models, Q-Drift improved the Fréchet Inception Distance (FID)—a key quality metric—by up to 4.59 points on PixArt-Sigma quantized to 3-bit weights and 4-bit activations, all while maintaining semantic accuracy measured by CLIP scores and adding negligible computational cost at inference.
- Corrects image degradation from model quantization with a plug-and-play sampler adjustment derived from just 5 calibration runs.
- Improved FID scores by up to 4.59 points on PixArt-Sigma models compressed with SVDQuant (W3A4), preserving CLIP accuracy.
- Works with major diffusion model families (DiT, U-Net), samplers (Euler, DPM-Solver++), and PTQ methods, adding negligible inference overhead.
Why It Matters
Enables high-quality, efficient deployment of billion-parameter image AI on consumer devices and edge hardware without sacrificing output fidelity.