PyTorch patches critical use-after-free bug in CUDA stream/event deallocation
Dangling weakrefs in CUDAStream/Event could crash PyTorch during shutdown.
PyTorch released a fix for a use-after-free (UAF) vulnerability in the deallocation of CUDA and XPU streams/events. The bug was introduced when the base classes `THPStream` and `THPEvent` added weakref support in PRs #164304 and #164522, but the four backend-specific `tp_dealloc` overrides (`torch.cuda.Stream`, `torch.cuda.Event`, `torch.xpu.Stream`, `torch.xpu.Event`) were not updated to call `PyObject_ClearWeakRefs(self)` or `Py_CLEAR(self->context)`. Because CPython does not automatically chain `tp_dealloc` for static types, every freed object left dangling weakrefs (pointers to freed memory) and a leaked lazy stream context object.
The dangling weakrefs remained latent until PR #180497 made PyTorch's dynamo compiler populate a `weakref.ref(torch.cuda.current_stream())` on every `torch.compile` invocation referencing `current_stream()`. During `Py_FinalizeEx`, the interpreter walks all weakrefs in the module's dict; when it encountered these dangling references, it tried to access the weakref list pointer on a `NULL` object, causing a SIGSEGV. The fix (PR #183403) re-synchronizes the backend deallocs with their base classes by calling `PyObject_ClearWeakRefs` and clearing the context, preventing the crash and memory leak.
- Four `tp_dealloc` overrides (CUDA/XPU stream/event) diverged from base when weakref support was added in #164304 and #164522.
- Bug left dangling weakrefs (not cleared) and leaked stream context objects, latent until torch.compile used `weakref.ref(current_stream())`.
- The crash occurred as a SIGSEGV during `Py_FinalizeEx`; fixed by re-synchronizing `tp_dealloc` to call `PyObject_ClearWeakRefs` and `Py_CLEAR(self->context)`.
Why It Matters
This patch prevents segmentation faults during PyTorch shutdown, especially for users leveraging `torch.compile` with CUDA streams.