Image & Video

Benchmark for SageAttention kernels using real attention shapes logged from ComfyUI models (image / video / audio)

r/StableDiffusion May 02, 2026

⚡Attention is the bottleneck; this tool measures it accurately using actual model shapes.

Deep Dive

A new benchmark runs on real attention shapes logged from ComfyUI models (SDXL, SD3.5-Large, Flux). It measures only the attention operation, not full inference. The tool compares SA2, SA2-fp8, SA3-FP4, and PyTorch SDPA kernels, outputting a JSON with median timing, VRAM, and TFLOPS. A free viewer lets you compare GPUs side-by-side.

Key Points

Logs real attention shapes from ComfyUI models (image, video, audio) rather than using synthetic inputs.
Benchmarks four kernels: SA2 (INT8), SA2-fp8, SA3-FP4 (FP4 block-scaled), and PyTorch SDPA baseline.
Outputs a GPU-named JSON file with median timing, VRAM usage, and TFLOPS; viewer enables multi-GPU comparison.

Why It Matters

Enables precise optimization of attention for image/video gen workflows, directly reducing generation latency.

Read Original Article

Benchmark for SageAttention kernels using real attention shapes logged from ComfyUI models (image / video / audio)

Why It Matters

Stay Ahead in AI