Developer Tools

viable/strict/1773038331: Support non-tensor attrs in __tensor_flatten__ (#176457)

PyTorch 2.4+ now lets custom tensor subclasses embed complex objects directly, enabling new distributed training patterns.

Deep Dive

The PyTorch core team has merged a significant pull request (#176457) that fundamentally expands what custom tensor subclasses can represent. Previously, the `__tensor_flatten__` protocol—used by PyTorch's internals to serialize and manipulate custom tensor types—could only return references to actual tensor data. This update allows the first return value to include 'opaque' objects, which are non-tensor Python objects like a `DeviceMesh` (a key abstraction in distributed training). These opaques are integrated into the flat argument list, receive indices, and become proper graph inputs and outputs, just like tensors.

This technical change required updates across multiple PyTorch subsystems to handle the new object type. Code in `subclass_parametrization`, `non_strict_utils`, FSDP's `_init_utils`, `common_utils`, `frontend_utils`, and `parametrize.py` was modified to iterate through `__tensor_flatten__` results without assuming every element was a tensor. For instance, `parametrize.py` now stores these opaque objects as plain attributes instead of parameters. The `SubclassCreationMeta.attrs` metadata includes an empty `OpaqueMeta` marker to distinguish these slots from differentiable tensor slots, which is critical for autograd operations that must skip non-differentiable data.

The addition is backed by a comprehensive new test suite (`test/test_opaque_obj_v2.py`) with 12 specific tests. These validate core functionality like backward passes with opaque attributes, support under `compiled_autograd`, correct behavior with shared opaque remapping, identity guarding for JIT compilation, and proper integration with export and FSDP. This enables researchers and framework developers to create tensor subclasses that natively encapsulate distributed computing layouts or other complex metadata, paving the way for more elegant and efficient model parallelization strategies directly within the tensor abstraction.

Key Points
  • Enables `__tensor_flatten__` to return non-tensor 'opaque' objects like DeviceMesh, integrating them into the computation graph.
  • Updated core subsystems (FSDP, export, parametrization) to handle opaques, marked by `OpaqueMeta` in `SubclassCreationMeta.attrs`.
  • Validated by 12 new tests covering backward passes, compiled autograd, shared remapping, and integration with export/FSDP.

Why It Matters

Enables cleaner, more powerful custom tensor types for distributed training and compilation, reducing boilerplate and potential errors.