trunk/4c6817d529a0339fde61a62586462a5479d7b2a1
A critical fix prevents crashes when running matrix multiplication on systems with 4+ GPUs.
Deep Dive
The PyTorch team has patched a bug in the main development branch (trunk/4c6817d). The commit adds a conditional skip (`skip_if_lt_x_gpu(4)`) to the `test_mm_with_strided_...` function in `test_matrix_ops.py`. This prevents the test from running and failing on systems with fewer than four GPUs, addressing a memory allocation error that could crash distributed training jobs during specific tensor operations.
Why It Matters
Ensures stable large-scale model training, preventing costly interruptions for AI researchers and engineers using multi-GPU clusters.