Developer Tools

viable/strict/1776156694: Fix slice_scatter meta overlap handling (#180166)

A subtle meta kernel bug in PyTorch's slice_scatter operation was corrupting gradient values during training.

Deep Dive

The PyTorch team has resolved a subtle but significant bug in the framework's slice_scatter operation that was causing incorrect gradient calculations during model training. The issue (#180166) stemmed from how the operation's meta kernel handled tensors with internal overlap—specifically, it was using clone_preserve_strides() even when the input tensor had overlapping memory layouts. This created impossible stride-0 layouts that PyTorch's Inductor compiler trusted, leading to collapsed iteration spaces and corrupted gradient values (x.grad) during backpropagation.

The fix, proposed by developer bobrenjc93, routes both slice_scatter and select_scatter operations through a shared _scatter_meta_output() helper function. This new approach detects when the base tensor has internal overlap and returns a proper self.clone() instead of preserving the problematic strides. For non-overlapping cases, it maintains the original clone_preserve_strides() behavior. This solution addresses the bad layout metadata at its source rather than adding compiler-specific workarounds, ensuring functional scatter operations maintain correct semantics across all PyTorch execution paths.

Importantly, this fix aligns PyTorch's meta tensor behavior with its eager execution mode, where scatter operations already fall back to cloning for overlapping tensors. The team has added a regression test (test_slice_scatter_backward_with_overlapping_base) to prevent similar issues in future releases and documented current limitations in CPU Pallas execution. This repair ensures that models using slice_scatter operations—particularly in complex tensor manipulation scenarios—will now compute gradients correctly during training.

Key Points
  • Fixed slice_scatter meta kernel bug that preserved impossible stride-0 layouts for overlapping tensors
  • Bug caused PyTorch Inductor to produce incorrect gradient values during backpropagation
  • Solution routes scatter ops through shared helper that properly clones overlapping tensors to match eager execution

Why It Matters

Ensures correct gradient calculations for models using slice_scatter operations, preventing subtle training errors in PyTorch-based AI systems.