trunk/f91370d8694c1d2b36592430c3f250e66948c74b: [ROCm] Fix MIOpen CTC loss crash on Windows (#179264)
A missing host pointer fix resolves CTC loss crashes on Windows dGPUs.
A critical bug in PyTorch's MIOpen integration caused CTC loss computation to crash with a fatal access violation on Windows systems with discrete AMD GPUs. The issue, identified in unit test test_CTCLoss_no_batch_dim, manifested on gfx1100 hardware when calling miopenGetCTCLossWorkspaceSize and miopenCTCLoss. The root cause was that these MIOpen functions read labels, label_lengths, and input_lengths arrays on the host side to plan computation and calculate workspace size. However, the PyTorch code was passing device pointers (from hipMalloc) instead of host-accessible pointers, which works on Linux due to HSA unified memory mapping and on Windows APUs due to shared system RAM, but fails on discrete GPUs where VRAM is not host-readable.
The fix, contributed by mstankov-amd and Jeff Daily in PR #179264, simply switches to using host pointers for the arrays passed to MIOpen's CTC loss functions. Verification on gfx1201 confirmed that hipDeviceAttributeIntegrated returns 0 (discrete GPU) and unified addressing is unsupported, while using CPU pointers resolves the crash. This patch ensures that CTC loss, a key component for training speech recognition and sequence-to-sequence models, works reliably on Windows with discrete AMD GPUs, closing a compatibility gap that previously forced users to Linux or APU-based systems.
- Fix resolves access violation (0xC0000005) in miopenGetCTCLossWorkspaceSize on Windows dGPUs
- Root cause: device pointers passed to host-side MIOpen functions; fix uses host pointers
- Patch verified on gfx1201 discrete GPU; all existing CTCLoss unit tests pass
Why It Matters
Enables stable CTC loss for speech/sequence models on Windows AMD dGPUs, removing a platform-specific barrier.