PR #184523 adds tests for CUDASymmetricMemory rendezvous with a torchcomms-backed ProcessGroup?

PR #184523 adds tests for CUDASymmetricMemory rendezvous with a torchcomms-backed ProcessGroup

metadata allgather via PG or fallback to TCPStore

Validates the _BackendWrapper shim path, enabling symm_mem without ProcessGroupNCCL?

Validates the _BackendWrapper shim path, enabling symm_mem without ProcessGroupNCCL

Developer Tools

PyTorch's new PR enables CUDA symmetric memory without NCCL

PyTorch Releases May 28, 2026

⚡Torchcomms shim adds test coverage for rendezvous bypassing ProcessGroupNCCL...

Deep Dive

PyTorch merged PR #184523, which adds test coverage for CUDA symmetric memory (symm_mem) when backed by torchcomms through the new _BackendWrapper shim. The PR introduces a test class `TorchCommsCudaSymmMemTest` that validates two rendezvous strategies: one where metadata allgather flows through the torchcomms-backed ProcessGroup, and another where metadata exchange falls back to the default TCPStore. In both cases, the test allocates a symmetric memory buffer, rendezvouses on the ProcessGroup, and verifies each rank can read its peers' buffers.

This test serves as both a regression guard and an example of how to use the _BackendWrapper with CUDA symmetric memory. By enabling symm_mem without requiring ProcessGroupNCCL, PyTorch opens the door to more flexible distributed training setups, especially in heterogeneous environments or when using custom communication backends. The PR was approved by ngimel and fduwjj, key maintainers of PyTorch's distributed module.

Key Points

PR #184523 adds tests for CUDASymmetricMemory rendezvous with a torchcomms-backed ProcessGroup
Two variants: metadata allgather via PG or fallback to TCPStore
Validates the _BackendWrapper shim path, enabling symm_mem without ProcessGroupNCCL

Why It Matters

Unlocks symmetric memory for custom backends, reducing NCCL dependency and enabling more flexible distributed training.

Read Original Article

PyTorch's new PR enables CUDA symmetric memory without NCCL

Why It Matters

Related Articles

🚀 Stay Ahead in AI