Developer Tools

PyTorch adds weekly MI355 GPU benchmark jobs for ROCm inductor

New AMD MI355 GPU testing with 24-hour max autotune runs every Sunday.

Deep Dive

PyTorch has merged pull request #183538, which introduces new ROCm (AMD GPU) benchmark jobs for the inductor dashboard. The centerpiece is a weekly max autotune job running on AMD MI355 GPU runners every Sunday, with a timeout of 1440 minutes (24 hours). This job adds maxautotune and freeze_autotune_cudagraphs to the existing benchmark suite, enabling deeper performance tuning on AMD's next-gen accelerators.

In addition, the PR adds conditional test jobs for both MI355 and MI300 GPUs. These test jobs have a 720-minute (12-hour) timeout and fire only on ciflow label pushes or workflow_dispatch, replacing previously unconditional single test jobs. The existing daily test-periodically schedule for MI300 remains unchanged. This structured testing ensures that PyTorch's ROCm backend stays optimized for AMD hardware as it evolves.

Key Points
  • Weekly max autotune benchmark job on AMD MI355 GPU runners every Sunday with 24-hour timeout
  • Conditional test jobs for MI355 and MI300 GPUs with 12-hour timeout, triggered by ciflow label or workflow_dispatch
  • Adds maxautotune and freeze_autotune_cudagraphs to the existing benchmark suite for ROCm

Why It Matters

Ensures PyTorch's performance on AMD MI355 GPUs with weekly automated benchmarking and tuning.