Solves 'sm_120' error for RTX 50-series GPUs on Windows, eliminating need for WSL2/Docker and associated NTFS overhead?

Solves 'sm_120' error for RTX 50-series GPUs on Windows, eliminating need for WSL2/Docker and associated NTFS overhead.

Uses PyTorch nightly cu130 for a critical 2x NVFP4 speedup on Blackwell architecture, with xformers excluded to prevent silent crashes?

Uses PyTorch nightly cu130 for a critical 2x NVFP4 speedup on Blackwell architecture, with xformers excluded to prevent silent crashes.

One-click 20-minute setup includes workflow conversion tools and supports 28 custom nodes and I2V pipelines on 32GB VRAM systems?

One-click 20-minute setup includes workflow conversion tools and supports 28 custom nodes and I2V pipelines on 32GB VRAM systems.

Image & Video

Hiroki Abe's ComfyUI-Win-Blackwell offers one-click setup for RTX 50-series on Windows

r/StableDiffusion March 03, 2026

⚡Open-source tool bypasses WSL2/Docker, fixes sm_120 error, and unlocks NVFP4 2x speedup for new GPUs.

Deep Dive

A developer has solved a major pain point for early adopters of NVIDIA's new Blackwell architecture GPUs. Hiroki Abe has open-sourced 'ComfyUI-Win-Blackwell,' a one-click Windows-native setup that allows users to run the popular ComfyUI visual AI workflow tool on RTX 50-series cards (5090/5080/5070) without resorting to Linux emulation layers. The tool directly addresses the 'sm_120' CUDA error that has plagued Windows users, offering a streamlined alternative to the standard WSL2 or Docker workarounds which introduce significant NTFS conversion overhead when loading large model files like safetensors. This release comes as a critical community fix, enabling professionals and enthusiasts to leverage their new hardware immediately for stable diffusion and AI video generation.

The technical implementation is carefully engineered based on three days of troubleshooting. The setup script installs PyTorch nightly cu130, which is required to unlock the NVFP4 2x inference speedup specific to Blackwell GPUs (noting that the older cu128 can actually be slower). A key discovery was the deliberate exclusion of the xformers optimization library, which was found to silently downgrade PyTorch from the required nightly build to a stable version, causing mid-inference crashes. The package includes tools to convert Linux-based ComfyUI workflows to Windows format and has been verified with 28 custom nodes and 5 image-to-video (I2V) pipelines on systems with 32GB of VRAM. Released under an MIT license, this tool significantly lowers the barrier to using cutting-edge AI hardware on the world's most common desktop operating system.

Key Points

Solves 'sm_120' error for RTX 50-series GPUs on Windows, eliminating need for WSL2/Docker and associated NTFS overhead.
Uses PyTorch nightly cu130 for a critical 2x NVFP4 speedup on Blackwell architecture, with xformers excluded to prevent silent crashes.
One-click 20-minute setup includes workflow conversion tools and supports 28 custom nodes and I2V pipelines on 32GB VRAM systems.

Why It Matters

Unlocks full performance of new $2K+ RTX 50-series GPUs for AI work on Windows, saving professionals days of setup frustration.

Read Original Article

Hiroki Abe's ComfyUI-Win-Blackwell offers one-click setup for RTX 50-series on Windows

Why It Matters

Related Articles

🚀 Stay Ahead in AI