Developer Tools

PyTorch optimizes softmax by caching log computation in CPU pointwise ops

Avoids recomputing log for relative-position bias, speeding up softmax loops on CPU.

Deep Dive

PyTorch has merged a performance optimization pull request (#184473) from contributor jansel that addresses redundant log computation in CPU pointwise operations. Specifically, the change marks the log function as a 'heavy CPU pointwise op' for the purposes of op reuse materialization. This prevents the log operation from being recomputed multiple times when softmax is applied to inputs that include relative-position bias—a common pattern in transformer models like T5. Previously, each iteration of the softmax loop would re-evaluate the log, wasting CPU cycles. With this fix, the log output is cached and reused across softmax iterations, significantly reducing computational overhead.

The PR includes a generated-code regression test modeled after T5-style architectures to ensure correctness. It resolves GitHub issue #95037, which likely reported performance degradation in models using relative-position bias. While this change is specific to CPU execution paths in PyTorch, it highlights ongoing efforts to fine-tune operator execution at the compiler level (TorchDynamo/Inductor). For developers deploying transformer models on CPUs—especially in production or resource-constrained environments—this optimization can translate to faster inference and lower latency without any code changes.

Key Points
  • Treats log as a heavy CPU pointwise op to enable reuse materialization in compiled graphs.
  • Prevents recomputation of log for softmax inputs containing relative-position bias (e.g., T5 models).
  • Fixes GitHub issue #95037 and includes a T5-style generated-code regression test.

Why It Matters

Reduces CPU overhead in transformer softmax loops, improving inference speed for models with relative-position bias.