Testing Torch 2.9 vs 2.10 vs 2.11 with FLUX.2 Dev on RTX 5060 Ti
Benchmark shows PyTorch 2.11 cuts FLUX.2 image generation time to 4.5 seconds per iteration.
A detailed benchmark by a Reddit user reveals significant performance gains for AI image generation by upgrading PyTorch. Testing the FLUX.2 Dev model on an NVIDIA RTX 5060 Ti GPU, the user compared PyTorch versions 2.9.0, 2.10.0, and 2.11.0 within the ComfyUI v0.18.1 environment. The standard test used a 20-step Euler sampler workflow. PyTorch 2.9, built with the older CUDA 128, was the slowest, averaging 5.35 seconds per iteration (s/it) and triggering warnings about suboptimal CUDA operations.
Upgrading to PyTorch 2.10 with CUDA 130 provided an immediate 7.3% speed boost, lowering the average time to 4.96 s/it. PyTorch 2.11 delivered the best performance, achieving 4.92 s/it in normal mode—an 8% improvement over version 2.9. The performance leap was even more pronounced when using the SAGE-Attn 2.2.0 optimization. In this 'FAST' mode, PyTorch 2.11 hit a peak speed of 4.50 s/it, completing a full image generation run in just 99.3 seconds on average.
The results provide a clear upgrade path for AI artists and developers. The benchmark demonstrates that simply updating PyTorch and ensuring a modern CUDA build (130+) can yield substantial workflow efficiency gains without changing hardware. For users of models like FLUX.2 in ComfyUI, this translates to faster iteration times and higher productivity.
- PyTorch 2.11 with CUDA 130 achieved the fastest speed at 4.50 seconds per iteration (s/it) using SAGE-Attn optimization.
- Upgrading from PyTorch 2.9 to 2.11 resulted in an 8% performance increase for standard FLUX.2 image generation.
- The older PyTorch 2.9 + CUDA 128 build was the slowest configuration and issued warnings about unoptimized CUDA operations.
Why It Matters
For AI artists and developers, a simple software update can significantly speed up image generation workflows without costly hardware upgrades.