Developer Tools

PyTorch patches critical use-after-free bug in CUDA stream/event deallocation

Dangling weakrefs in CUDAStream/Event could crash PyTorch during shutdown.

Deep Dive

PyTorch released a fix for a use-after-free (UAF) vulnerability in the deallocation of CUDA and XPU streams/events. The bug was introduced when the base classes `THPStream` and `THPEvent` added weakref support in PRs #164304 and #164522, but the four backend-specific `tp_dealloc` overrides (`torch.cuda.Stream`, `torch.cuda.Event`, `torch.xpu.Stream`, `torch.xpu.Event`) were not updated to call `PyObject_ClearWeakRefs(self)` or `Py_CLEAR(self->context)`. Because CPython does not automatically chain `tp_dealloc` for static types, every freed object left dangling weakrefs (pointers to freed memory) and a leaked lazy stream context object.

The dangling weakrefs remained latent until PR #180497 made PyTorch's dynamo compiler populate a `weakref.ref(torch.cuda.current_stream())` on every `torch.compile` invocation referencing `current_stream()`. During `Py_FinalizeEx`, the interpreter walks all weakrefs in the module's dict; when it encountered these dangling references, it tried to access the weakref list pointer on a `NULL` object, causing a SIGSEGV. The fix (PR #183403) re-synchronizes the backend deallocs with their base classes by calling `PyObject_ClearWeakRefs` and clearing the context, preventing the crash and memory leak.

Key Points
  • Four `tp_dealloc` overrides (CUDA/XPU stream/event) diverged from base when weakref support was added in #164304 and #164522.
  • Bug left dangling weakrefs (not cleared) and leaked stream context objects, latent until torch.compile used `weakref.ref(current_stream())`.
  • The crash occurred as a SIGSEGV during `Py_FinalizeEx`; fixed by re-synchronizing `tp_dealloc` to call `PyObject_ClearWeakRefs` and `Py_CLEAR(self->context)`.

Why It Matters

This patch prevents segmentation faults during PyTorch shutdown, especially for users leveraging `torch.compile` with CUDA streams.