A faulty @skipIfXpu shim was silently skipping TestFastCudaLauncher on all platforms?

A faulty @skipIfXpu shim was silently skipping TestFastCudaLauncher on all platforms

After fixes #182295 and #182439, the test started running on ROCm and crashed on hipModuleLaunchKernel?

After fixes #182295 and #182439, the test started running on ROCm and crashed on hipModuleLaunchKernel

Both test classes now skipped on ROCm pending a fix in torch/csrc/inductor/static_launcher/cuda.cpp?

Both test classes now skipped on ROCm pending a fix in torch/csrc/inductor/static_launcher/cuda.cpp

Developer Tools

PyTorch patches segfault bug in AMD GPU fast launcher test

PyTorch Releases May 11, 2026

⚡A buggy shim silently dropped tests, then caused segfaults on ROCm.

Deep Dive

PyTorch’s latest commit addresses a subtle bug affecting AMD GPU (ROCm) users. A class-level @skipIfXpu shim was erroneously dropping TestFastCudaLauncher and TestFastCudaLauncherCompileResult from test discovery on every platform. This meant the tests were never actually run, masking underlying issues. Once commits #182295 and #182439 corrected the shim behavior, these tests started executing on ROCm. Immediately, the _FastCudaLauncher’s hipModuleLaunchKernel path triggered a segfault on the first kernel launch.

The fix, authored by Claude and reviewed by Jeff Daily (AMD), temporarily skips both test classes on ROCm. The root cause lies in torch/csrc/inductor/static_launcher/cuda.cpp, where the kernel launching mechanism on AMD hardware lacks proper handling. This is a stop-gap measure while a permanent solution is developed. For developers building AI models with PyTorch on AMD GPUs, this ensures that CI tests don't break, but users should be aware of potential instability in the inductor’s static launcher until the underlying fix lands.

Key Points

A faulty @skipIfXpu shim was silently skipping TestFastCudaLauncher on all platforms
After fixes #182295 and #182439, the test started running on ROCm and crashed on hipModuleLaunchKernel
Both test classes now skipped on ROCm pending a fix in torch/csrc/inductor/static_launcher/cuda.cpp

Why It Matters

AMD GPU users get stable PyTorch tests, critical for AI workloads on ROCm.

Read Original Article

PyTorch patches segfault bug in AMD GPU fast launcher test

Why It Matters

Related Articles

🚀 Stay Ahead in AI