Developer Tools

trunk/537c0fd2ef63692db6d2bc9e36c34b776e250969: [MPS] Add error checking for bmm (#176771)

A single-line code change prevents Apple Silicon GPUs from crashing when given mismatched data types.

Deep Dive

The PyTorch development team has resolved a significant stability issue for users running AI workloads on Apple Silicon Macs. A recent commit (537c0fd2ef) to the main PyTorch codebase adds crucial error checking to the `bmm` (batch matrix multiplication) function when using the MPS (Metal Performance Shaders) backend. Before this fix, invoking `torch.bmm` with mismatched data types—specifically a Float tensor and a Long (integer) tensor—would cause the MPSGraph framework to abort internally, resulting in a hard crash of the Python process. This was a deviation from the behavior on CPU and CUDA backends, which would throw a descriptive error instead.

The patch standardizes error handling across PyTorch's compute backends. Now, when a developer accidentally passes incompatible tensors to `bmm` on an MPS device, they receive a clear `RuntimeError: Expected arguments of same type but got Float and Long`. This mirrors the error message from other platforms, such as CPU (`expected scalar type Float but found Long`). The change, tagged by maintainer 'malfet', was approved by multiple core developers and is a dependency for a larger stack of improvements. This single-line fix prevents silent, hard-to-debug crashes, making PyTorch on Apple Silicon more robust for production and research code where tensor type mismatches are a common debugging scenario.

Key Points
  • Fixes a hard crash on Apple Silicon (MPS) when `torch.bmm` receives a float and an integer tensor.
  • Aligns MPS backend error behavior with CPU/CUDA, throwing a RuntimeError instead of an internal abort.
  • The commit (537c0fd) is part of a dependency stack for broader MPS stability improvements.

Why It Matters

Prevents silent crashes in Mac-based AI development, saving hours of debugging and improving framework reliability.