Developer Tools

trunk/2ed93d5582b2e8d1d8a401de37e38f2b3e17649f: Add operator microbenchmark comparison workflow for PRs (#179476)

PyTorch Releases May 09, 2026

⚡New workflow benchmarks single ops on H100, A100, B200, MI300X GPUs.

Deep Dive

PyTorch has merged a new GitHub Action workflow (operator_microbenchmark_compare.yml) that automates per-operator performance benchmarking for pull requests. The workflow selects a single operator (e.g., matmul) from the PR branch, runs it on a specified GPU (H100, A100, B200 for CUDA, or MI300X for ROCm), and compares the results against the latest daily CI benchmark on main. A formatted comparison table is automatically posted as a PR comment, giving contributors immediate visibility into performance changes.

To enable this single-operator selection, the test.sh script now parses the test config name (e.g., operator_microbenchmark_matmul_test) to extract the operator. The test_operator_microbenchmark function respects the OP_BENCHMARK_TESTS environment variable. The changes are fully backward-compatible: the existing operator_microbenchmark.yml workflow uses the config name "operator_microbenchmark_test" which by default tests all operators. This new workflow reuses the existing _linux-build.yml and _linux-test.yml reusable workflows, following the pattern established by inductor-perf-compare.yml. The PR was authored by jainapurva and co-authored by Huy Do.

Key Points

Adds GitHub Action benchmark_compare.yml for single-operator perf comparison.
Supports NVIDIA H100, A100, B200 (CUDA) and AMD MI300X (ROCm) GPUs.
Automatically posts formatted benchmark diff table as a PR comment.

Why It Matters

Enables PR authors to quickly spot performance regressions on specific GPU operators before merge.

Read Original Article

trunk/2ed93d5582b2e8d1d8a401de37e38f2b3e17649f: Add operator microbenchmark comparison workflow for PRs (#179476)

Why It Matters

Stay Ahead in AI