PyTorch's Inductor now supports benchmarking on non-CUDA devices
A new PR extends InductorBenchmarker to CPUs and other backends, fixing critical issues.
Deep Dive
PR #184567 from guangyey adds InductorBenchmarker support for non-CUDA devices, fixing issues #184470, #184491, and #185616.
Key Points
- Extends InductorBenchmarker to CPU, AMD GPU (ROCm), and other non-CUDA backends.
- Fixes three reported issues (#184470, #184491, #185616) that broke benchmarking on non-CUDA devices.
- Approved by PyTorch core maintainer Jason Ansel, part of ghstack dependencies.
Why It Matters
Enables consistent performance profiling across all PyTorch-supported hardware, simplifying optimization for diverse deployments.