trunk/2ed93d5582b2e8d1d8a401de37e38f2b3e17649f: Add operator microbenchmark comparison workflow for PRs (#179476)
New workflow benchmarks single ops on H100, A100, B200, MI300X GPUs.
PyTorch has merged a new GitHub Action workflow (operator_microbenchmark_compare.yml) that automates per-operator performance benchmarking for pull requests. The workflow selects a single operator (e.g., matmul) from the PR branch, runs it on a specified GPU (H100, A100, B200 for CUDA, or MI300X for ROCm), and compares the results against the latest daily CI benchmark on main. A formatted comparison table is automatically posted as a PR comment, giving contributors immediate visibility into performance changes.
To enable this single-operator selection, the test.sh script now parses the test config name (e.g., operator_microbenchmark_matmul_test) to extract the operator. The test_operator_microbenchmark function respects the OP_BENCHMARK_TESTS environment variable. The changes are fully backward-compatible: the existing operator_microbenchmark.yml workflow uses the config name "operator_microbenchmark_test" which by default tests all operators. This new workflow reuses the existing _linux-build.yml and _linux-test.yml reusable workflows, following the pattern established by inductor-perf-compare.yml. The PR was authored by jainapurva and co-authored by Huy Do.
- Adds GitHub Action benchmark_compare.yml for single-operator perf comparison.
- Supports NVIDIA H100, A100, B200 (CUDA) and AMD MI300X (ROCm) GPUs.
- Automatically posts formatted benchmark diff table as a PR comment.
Why It Matters
Enables PR authors to quickly spot performance regressions on specific GPU operators before merge.