Developer Tools

PyTorch fixes Inductor cache bug for dtype changes

A subtle caching bug in PyTorch's Inductor could produce wrong results when default dtype changes

Deep Dive

In a recent commit to the PyTorch repository, contributor jansel addressed a subtle caching bug in the Inductor FX graph compiler. The issue, tracked as #184405, occurred when a user called torch.set_default_dtype to change the default floating-point dtype (e.g., from float32 to float64). PyTorch's Inductor backend caches compiled graph results based on a cache key. However, that key previously omitted the ambient default dtype, meaning that factory operations (like torch.zeros or torch.ones without an explicit dtype argument) could reuse a cached graph compiled under a different default dtype. This led to incorrect tensor types in the compiled output, potentially causing silent numeric errors in models or training scripts.

The fix modifies the cache key to include the current default dtype, ensuring that a graph compiled under float32 is never reused when the default is float64, and vice versa. Additionally, the commit adds regression coverage specifically for no-input factory ops (operations that produce tensors based solely on shape and dtype) when compiled under different default dtypes. This change is part of PyTorch's ongoing effort to harden its compilation stack (Inductor) and improve reliability for production workloads where default dtype changes are common, such as mixed-precision training or inference pipelines.

Key Points
  • Cache key now includes default dtype to avoid reusing stale Inductor FX graphs after torch.set_default_dtype calls
  • Fixes GitHub issue #184405, which caused incorrect tensor types in compiled factory ops
  • Includes regression tests for no-input factories (e.g., torch.zeros) under varying default dtypes

Why It Matters

Eliminates silent numerical errors in PyTorch compiled code when default dtype changes, improving reliability for ML practitioners.