Benchmark for SageAttention kernels using real attention shapes logged from ComfyUI models (image / video / audio)
Attention is the bottleneck; this tool measures it accurately using actual model shapes.
Deep Dive
A new benchmark runs on real attention shapes logged from ComfyUI models (SDXL, SD3.5-Large, Flux). It measures only the attention operation, not full inference. The tool compares SA2, SA2-fp8, SA3-FP4, and PyTorch SDPA kernels, outputting a JSON with median timing, VRAM, and TFLOPS. A free viewer lets you compare GPUs side-by-side.
Key Points
- Logs real attention shapes from ComfyUI models (image, video, audio) rather than using synthetic inputs.
- Benchmarks four kernels: SA2 (INT8), SA2-fp8, SA3-FP4 (FP4 block-scaled), and PyTorch SDPA baseline.
- Outputs a GPU-named JSON file with median timing, VRAM usage, and TFLOPS; viewer enables multi-GPU comparison.
Why It Matters
Enables precise optimization of attention for image/video gen workflows, directly reducing generation latency.