trunk/67b8be4142e25149abbc1dabadbb766791a3c647: [inductor] Pass correct device to print_performance (#181957)
benchmark_compiled_module always reported 'cuda' – now it uses actual device
PyTorch's latest commit (67b8be4) addresses a bug in the inductor compiler's benchmarking utility. The issue, reported in GitHub issue #181954, caused `print_performance` to always receive `'cuda'` as the device parameter within `benchmark_compiled_module`. This meant that performance metrics were consistently attributed to CUDA, even when the actual computation ran on CPU. For users running benchmarks on non-CUDA devices, this led to misleading performance outputs.
The fix is straightforward but impactful. Contributor guangyey submitted a pull request that passes the correct device variable—derived from the module being benchmarked—to the `print_performance` function. The change ensures that performance data accurately reflects the hardware used for computation. Approved by PyTorch core maintainer jansel, this merge improves the reliability of benchmark reports for developers working with mixed-device workflows or exclusively CPU environments. The change is part of ongoing maintenance for the inductor project, which optimizes PyTorch models for faster execution.
- Bug: `print_performance` always displayed 'cuda' regardless of actual device
- Fix passed correct device parameter in `benchmark_compiled_module`
- Approved by PyTorch maintainer jansel, merged on May 2, 2025
Why It Matters
Accurate device-specific performance data is critical for optimizing PyTorch models on CPU vs GPU.