Fixes a bug where `replace_by_example` assumed equal node counts between eager and val traces, failing for Triton kernel decompositions with dynamic size-1 leading dimensions?

Fixes a bug where `replace_by_example` assumed equal node counts between eager and val traces, failing for Triton kernel decompositions with dynamic size-1 leading dimensions.

The fix returns inserted replacement nodes so decompositions can attach `eager_input_vals` metadata explicitly, applied only to `triton_kernel_wrapper_functional`?

The fix returns inserted replacement nodes so decompositions can attach `eager_input_vals` metadata explicitly, applied only to `triton_kernel_wrapper_functional`.

Includes regression tests for Triton and direct tests for `replace_by_example` return values, authored with Claude and Codex?

Includes regression tests for Triton and direct tests for `replace_by_example` return values, authored with Claude and Codex.

Developer Tools

PyTorch PR #183334 fixes Triton kernel eager_input_vals propagation bug

PyTorch Releases May 14, 2026

⚡A new fix ensures correct metadata propagation for Triton kernels when decomposition changes node counts.

Deep Dive

PyTorch has merged a crucial fix for the Triton kernel compilation pipeline in PR #183334. The issue stemmed from the `replace_by_example` function, which previously assumed that the eager trace and the value (val) trace would produce replacement graphs with the same number of nodes. This assumption breaks when the decomposition for `triton_kernel_wrapper_functional` clones an `as_strided` view that has a dynamic size-1 leading dimension, causing a mismatch in node count and leading to incorrect `eager_input_vals` propagation.

The fix modifies `replace_by_example` to return the inserted replacement nodes, allowing decompositions to attach metadata (like `eager_input_vals`) directly when needed. This change is specifically applied to the `triton_kernel_wrapper_functional` path, which now propagates `eager_input_vals` directly to the inserted `triton_kernel_wrapper_mutation` node. The existing generic propagation for `auto_functionalized` paths is left unchanged because the team could not reproduce the bug for that case and lacked sufficient understanding to justify modifying it. The PR also includes a regression test for the Triton failure case and a direct test for the `replace_by_example` return value, ensuring the fix is robust.

Key Points

Fixes a bug where `replace_by_example` assumed equal node counts between eager and val traces, failing for Triton kernel decompositions with dynamic size-1 leading dimensions.
The fix returns inserted replacement nodes so decompositions can attach `eager_input_vals` metadata explicitly, applied only to `triton_kernel_wrapper_functional`.
Includes regression tests for Triton and direct tests for `replace_by_example` return values, authored with Claude and Codex.

Why It Matters

Ensures correct Triton kernel compilation in PyTorch, preventing silent errors in dynamic shape scenarios for production ML workloads.

Read Original Article

PyTorch PR #183334 fixes Triton kernel eager_input_vals propagation bug

Why It Matters

Related Articles

🚀 Stay Ahead in AI