Developer Tools

trunk/18464b706adf4cd281bd4eddbd8cbcb3682d9ca6: [MPS] Support complex inputs to `cumprod` (#178436)

The PyTorch commit enables complex tensor cumprod on MPS, with complex64 running up to 3x faster than float32 in benchmarks.

Deep Dive

The PyTorch development team has merged a significant update to its core framework, enabling native support for complex number inputs to the `torch.cumprod` function when using the MPS (Metal Performance Shaders) backend on Apple Silicon Macs. This commit (18464b706adf) resolves GitHub issue #178436, extending GPU acceleration to cumulative product operations on complex64 tensors, which are crucial for fields like quantum mechanics, signal processing, and electrical engineering.

The performance impact is notable. The included benchmark script tests various tensor sizes and dimensions, comparing float32 and complex64 runtimes. Results show the new complex64 implementation is frequently faster, sometimes by a factor of 2-3x. For example, a (10,) sized tensor processed in 4.57 microseconds for complex64 versus 9.35 microseconds for float32. This efficiency gain is a reversal of the typical expectation where complex operations are slower, indicating a highly optimized kernel.

This update removes a major friction point for scientists and engineers using PyTorch on Macs. Previously, users working with complex-valued data would need to separate real and imaginary components or fall back to the CPU for `cumprod` operations, creating bottlenecks. Now, complex tensor math can flow seamlessly through the MPS backend, unlocking the full potential of Apple's GPU hardware for advanced research and development workflows.

Key Points
  • PyTorch commit 18464b7 adds complex number support for `cumprod` on the MPS backend.
  • Benchmarks show complex64 operations can be 2-3x faster than float32 (e.g., 4.57µs vs 9.35µs).
  • Enables native GPU acceleration for quantum computing and signal processing workflows on Apple Silicon.

Why It Matters

Removes a key bottleneck for researchers using complex-valued data on Macs, enabling faster quantum and signal processing simulations directly on Apple Silicon GPUs.