How much can a video generated by the same diffusion model differ across GPU architectures if the initial noise latent is fixed? [D]
Even with identical weights and noise latents, different GPUs may produce noticeably different videos.
Deep Dive
A Reddit user asks whether running the same video diffusion model with identical weights, prompt, parameters, deterministic sampler, and starting noise on two different GPU architectures would produce nearly identical videos. They note that bitwise-identical outputs are unlikely due to floating-point math differences, but wonder if the differences could be immediately noticeable to a human eye or only result in tiny pixel-level differences.
Key Points
- Floating-point math differences across GPU architectures can cause non-bitwise-identical outputs even with fixed weights and seeds.
- For video diffusion, temporal accumulation of errors may lead to perceptible flickering or motion artifacts, unlike static images where differences are often negligible.
- This raises reproducibility concerns for research and deployment, particularly when comparing results from different cloud GPU providers or hardware generations.
Why It Matters
Reproducibility in AI video generation is hardware-dependent, affecting research validity and production consistency for developers.