PyTorch CUDAGraph fix ensures cache eviction on stale tensor check
PyTorch patch aligns CUDAGraph replay with warmup invalidation semantics
Deep Dive
PR #184368 fixes CUDAGraph stale output checks. The fix makes replay outputs follow the same stale-output invalidation semantics as warmup and recording, including evicting cached Tensor outputs before poisoning stale storages. It fixes issue #122192.
Key Points
- Replay outputs now follow invalidation semantics identical to warmup and recording phases
- Cached tensors are evicted before stale storages are poisoned, preventing reuse of outdated data
- Fix resolves issue #122192, improving CUDAGraph reliability for production workloads
Why It Matters
Eliminates a subtle CUDAGraph bug that could corrupt results in repeated GPU operations.