PyTorch patches segfault bug in AMD GPU fast launcher test
A buggy shim silently dropped tests, then caused segfaults on ROCm.
PyTorch’s latest commit addresses a subtle bug affecting AMD GPU (ROCm) users. A class-level @skipIfXpu shim was erroneously dropping TestFastCudaLauncher and TestFastCudaLauncherCompileResult from test discovery on every platform. This meant the tests were never actually run, masking underlying issues. Once commits #182295 and #182439 corrected the shim behavior, these tests started executing on ROCm. Immediately, the _FastCudaLauncher’s hipModuleLaunchKernel path triggered a segfault on the first kernel launch.
The fix, authored by Claude and reviewed by Jeff Daily (AMD), temporarily skips both test classes on ROCm. The root cause lies in torch/csrc/inductor/static_launcher/cuda.cpp, where the kernel launching mechanism on AMD hardware lacks proper handling. This is a stop-gap measure while a permanent solution is developed. For developers building AI models with PyTorch on AMD GPUs, this ensures that CI tests don't break, but users should be aware of potential instability in the inductor’s static launcher until the underlying fix lands.
- A faulty @skipIfXpu shim was silently skipping TestFastCudaLauncher on all platforms
- After fixes #182295 and #182439, the test started running on ROCm and crashed on hipModuleLaunchKernel
- Both test classes now skipped on ROCm pending a fix in torch/csrc/inductor/static_launcher/cuda.cpp
Why It Matters
AMD GPU users get stable PyTorch tests, critical for AI workloads on ROCm.