Developer Tools

trunk/1b868a96e4bbe302d2078d1e0e9e7355a55330ec: [bug][ROCm][inductor] accept 1D bias in addmm ATen heuristic (#179087)

A critical bug in PyTorch's Inductor compiler was blocking AMD GPU users from optimal performance.

Deep Dive

The PyTorch development team has resolved a significant bug in its Inductor compiler, a core component for accelerating PyTorch models. The issue, tracked as PR #179087, specifically affected users running PyTorch on AMD GPUs via the ROCm software stack. The bug was inadvertently introduced in an earlier optimization pull request (#177130) and prevented the Inductor's ATen operation heuristic from correctly accepting a 1-dimensional bias tensor in `addmm` (matrix multiplication with addition) operations. This blocked a key optimization path.

This fix is critical because the `bias_addmm` heuristic is a performance optimization that fuses a bias addition operation into a preceding matrix multiplication. When this heuristic fails, models fall back to slower execution paths, directly impacting training and inference speed. The patch, contributed by AMD engineer naromero77amd, ensures ROCm's native 1D bias format is accepted, restoring the optimization while improving the associated unit test. The fix was approved and merged into PyTorch's main development trunk, highlighting the ongoing collaboration to improve performance on alternative hardware platforms like AMD.

Key Points
  • Fixes a regression bug (PR #179087) in PyTorch's Inductor compiler for AMD ROCm users.
  • Restores the `bias_addmm` optimization heuristic for 1D bias tensors in matrix multiplications.
  • Addresses an issue introduced in a prior PR (#177130), ensuring optimal performance on AMD GPUs.

Why It Matters

This fix directly impacts the speed and efficiency of AI model training and inference for developers using PyTorch on AMD hardware.