SmartAttentionDispatcher brings SageAttention speed to ComfyUI without restart
ComfyUI node swaps PyTorch attention for SageAttention kernels, auto-detects GPU and model
SmartAttentionDispatcher is a new ComfyUI node that replaces PyTorch's SDPA with SageAttention kernels (SA2 and SA3) to speed up attention computation during image generation. It runs as a plug-and-play patch after model loading and LoRA application, without requiring a ComfyUI restart or --use-sage-attention flags. The node automatically detects GPU architecture (e.g., RTX 50xx for SA3), installed libraries (sageattn, sageattn3), and model architecture, then selects the best kernel. Users see active mode, GPU tier, and kernel availability in the node status panel.
The node offers four modes: standard SDPA (no change), SA2 (SageAttention2 with kernels like fp16, fp8, triton), SA3 (SageAttention3 for Blackwell GPUs, CUDA 12.8+), and a Combine dynamic mode that uses SA2 for first/last steps and SA3 for middle steps. It patches most DiT models (Flux, SD3.5, Z-Image, LTX, Wan) via transformer_options, but also scans sys.modules for models that import attention locally (Qwen, ErnieImage, ACE-Step). Tested models show compatibility except Qwen in SA3 mode produces unstable outputs for sequences over 7000 tokens; SA2 works correctly. SDXL support exists but gains are minimal due to short sequences.
- Auto-detects GPU (Blackwell for SA3, others for SA2) and selects best kernel (fp16, fp8, fp8++, triton) without user config
- Three attention modes: SA2 for speed, SA3 for Blackwell, and dynamic Combine that switches per diffusion step for optimal quality
- Supports models with local attention imports (Qwen, ErnieImage) via sys.modules patching; Qwen SA3 unstable above 7k sequence length
Why It Matters
Faster image generation with smarter attention, no manual tuning needed for ComfyUI professionals.