Developer Tools

PyTorch TorchInductor optimizes persistent reductions with smarter eviction policy

New GPU kernel optimization applies delayed eviction to persistent reductions, improving memory coalescing.

Deep Dive

The delayed eviction-policy decision is applied to persistent reductions as well as looped reductions, so coalesced last-use loads get evict_first while reused, broadcasted, and non-coalesced loads keep evict_last. This revives the codegen portion of stale PR #119622 and fixes #119523.

Key Points
  • Extends the existing delayed eviction policy from looped reductions to persistent reductions in TorchInductor.
  • Coalesced last-use loads are marked evict_first, while reused, broadcasted, or non-coalesced loads retain evict_last.
  • Revives code from stale PR #119622 and fixes issue #119523, impacting GPU kernel code generation.

Why It Matters

Better GPU memory management in PyTorch leads to faster training and inference for models with frequent reduction operations.