viable/strict/1778313401: Add operator microbenchmark comparison workflow for PRs (#179476)
New GitHub Action benchmarks a single operator and compares to latest main.
PyTorch has added a new GitHub Action called operator_microbenchmark_compare.yml to streamline performance testing for pull requests. This workflow allows developers to benchmark a single operator (e.g., matmul) on a specified GPU from their PR branch. The results are automatically compared against the latest daily CI run on the main branch, and a formatted comparison table is posted directly as a PR comment. This eliminates manual benchmarking and speeds up the detection of performance regressions introduced by code changes.
The action supports multiple GPU architectures: H100, A100, B200 for CUDA and MI300X for ROCm. To enable single-operator selection, the test.sh script now parses the test config name (e.g., operator_microbenchmark_matmul_test) to extract the operator name, and the test_operator_microbenchmark function respects the OP_BENCHMARK_TESTS environment variable. The implementation is fully backward compatible: the existing operator_microbenchmark.yml workflow uses the default config "operator_microbenchmark_test" which falls through to the full operator list. The change follows the pattern established by inductor-perf-compare.yml.
- New GitHub Action operator_microbenchmark_compare.yml benchmarks a single operator on a chosen GPU from a PR branch.
- Supports H100, A100, B200 (CUDA) and MI300X (ROCm) GPUs for flexible testing.
- Enables operator selection via OP_BENCHMARK_TESTS env var; fully backward compatible with existing workflows.
Why It Matters
Automates performance regression detection for individual operators, making PyTorch PR reviews more efficient.