trunk/980f368fd386fba5a83fdda2a892af23a58f9513: [ROCm] Reland SDPA dropout fix (#174708)" (#178713) (#178713)
A critical fix for SDPA dropout ensures consistent, reproducible AI model training on AMD hardware.
The AMD ROCm engineering team, led by Andy Lugo Reyes, has successfully re-applied a critical fix to PyTorch's core framework, resolving a dropout malfunction in the Scaled Dot-Product Attention (SDPA) pathway for AMD GPUs. The fix, identified as PR #178713, specifically addresses how the Composable Kernel (CK) library handles random number generation for dropout during the attention operation. The previous bug, which was auto-reverted, caused incorrect seed and offset management, potentially leading to non-reproducible training runs and unstable model convergence on ROCm-powered systems.
This technical correction ensures that the forward and backward passes for attention layers with dropout now correctly use kernel-launched state parsing (`ParsePhiloxCudaState`) instead of flawed host-side unpacking. It also reinstates proper CK-specific dropout mask logic and parametrized testing. For developers and researchers training large language models (LLMs) or vision transformers on AMD Instinct GPUs (like the MI250X or MI300A/X), this patch is essential for achieving reliable, deterministic results that match the training stability previously only associated with NVIDIA's CUDA ecosystem.
- Fixes a dropout seed/offset bug in PyTorch's ROCm SDPA kernel, restoring training reproducibility.
- Re-applies the original PR #174708 fix for AMD's Composable Kernel (CK) library after a faulty auto-revert.
- Ensures correct random state handling in both forward and backward passes for attention layers on AMD GPUs.
Why It Matters
Enables stable, production-ready AI model training on AMD hardware, challenging NVIDIA's CUDA dominance.