Developer Tools

trunk/57726b7740ee566258707dc38e01752e79642f50: Handle FakeTensor add_ with meta rhs during tracing (#177524)

A subtle bug in PyTorch's FakeTensor system was causing torch.compile to fail during tracing operations.

Deep Dive

The PyTorch team has resolved a significant technical bug (issue #166626, fix #177524) in its core tracing infrastructure. The issue stemmed from the FakeTensor system, a component used during graph compilation to simulate tensor behavior without actual data. Specifically, when tracing operations that used the in-place addition operator (`aten.add_.Tensor`) with a tensor residing on the `meta` device (a virtual device for shape/type inference) as the right-hand side argument, the system incorrectly treated this as a hard cross-device operation mismatch. This caused the popular `torch.compile` feature to fail during the graph capture phase for certain model code patterns.

The fix, authored by @bobrenjc93 and drafted with assistance from the AI coding tool Codex, modifies the FakeTensor's common-device resolution logic. It now correctly allows a `meta` device tensor as the right-hand side for in-place addition, recognizing that such operations should preserve the device of the destination (left-hand side) tensor. This change keeps the tracing process aligned with the operator's actual runtime behavior while preserving the system's ability to catch genuine, problematic device mismatches in other contexts. The pull request includes regression tests for both the low-level FakeTensor API and the higher-level `torch.compile` workflow to prevent future regressions.

Key Points
  • Fixed a bug where FakeTensor incorrectly blocked `aten.add_.Tensor` ops with `meta` RHS tensors during tracing.
  • The issue ( #166626 ) caused `torch.compile` to fail for specific code patterns, breaking model compilation.
  • The fix ensures device propagation aligns with runtime behavior while keeping strict checks for real device errors.

Why It Matters

This fix stabilizes PyTorch's just-in-time compilation (torch.compile) for more complex models, directly impacting AI researchers and engineers relying on performance optimization.