trunk/36ac9e292f35c4c9f03b5bcc2b1f44f2253e235e: [MPS] Fix mm with stride-0 inputs on macOS < 26.4 (#180236)
A subtle bug was zeroing out 15/16 rows in matrix multiplication results on Apple Silicon Macs.
The PyTorch development team has resolved a significant bug (issue #180236) in their Metal Performance Shaders (MPS) backend, which is crucial for running AI models efficiently on Apple Silicon Macs. The bug was triggered when performing matrix multiplication (`mm`) operations on tensors created using the `tensor.expand()` function, which can produce arrays with a stride of zero. On affected macOS versions (15.0 up to, but not including, 26.4), this caused the MPSGraph's `matrixMultiplication` function to compute only 1 out of every 16 rows correctly, filling the remaining 15 rows with zeros—a catastrophic error for any machine learning computation.
The fix, contributed with assistance from Anthropic's Claude AI, involves a targeted workaround. It disables the optimized MPS strided API pathway specifically for `mm` operations where an input tensor has a stride of zero. Instead, these tensors are now materialized into a standard, contiguous format via a gather/clone operation before the calculation proceeds. This ensures correctness at a potential, minor performance cost. Notably, the check is skipped on macOS 26.4 and later, where Apple has apparently resolved the underlying issue in their Metal framework. The patch prevents silent, hard-to-detect numerical errors in PyTorch-based training and inference on Macs.
- Bug caused matrix multiplication (`mm`) to zero out 15 of every 16 result rows on macOS 15.0-26.4.
- Triggered by `tensor.expand()` operations creating stride-0 inputs to the MPS backend on Apple Silicon.
- Fix implements a workaround, forcing contiguity for problematic inputs, and is bypassed on macOS >=26.4.
Why It Matters
This prevents silent, catastrophic numerical errors in AI model training and inference for developers using PyTorch on Macs.