Developer Tools

PyTorch fixes fork_rng bug for non-CUDA devices in PR #180512

Specifying device type prevents torch.random fork_rng errors on Intel GPUs.

Deep Dive

PyTorch’s latest merge (PR #180512), authored by frost-intel and approved by guangyey and albanD, fixes a critical bug in `torch.random.fork_rng` that prevented the function from working with non-CUDA device types. The bug was inadvertently introduced by PR #177728, which updated RNG handling but omitted the `device_type` argument when calling `fork_rng` in `torch.distributions`’ `schedules.py`. This caused `fork_rng` to always default to CUDA, breaking reproducibility on devices like Intel GPUs (XPUs) and potentially other non-CUDA backends.

The fix modifies the `fork_rng` signature so that the `device_type` argument defaults to `None`. The implementation then infers the device type from the passed `devices` iterable. If inference fails (e.g., devices are passed as integers), it safely falls back to `'cuda'`. This preserves backward compatibility while enabling broader hardware support. The change is small but critical for the growing ecosystem of alternative accelerators PyTorch supports, including Intel, AMD, and Apple Silicon GPUs.

Key Points
  • Bug introduced by PR #177728 broke `fork_rng` for non-CUDA devices (e.g., Intel XPUs).
  • Fix changes `device_type` default to `None` and infers from devices list; falls back to 'cuda' if inference fails.
  • Approved by PyTorch core contributors (guangyey, albanD), ensuring backward compatibility.

Why It Matters

Ensures deterministic RNG across all PyTorch device backends, crucial for reproducible ML workflows on non-NVIDIA hardware.