trunk/e7c05590485eec8b2adef9f37719e76c1bf10107: Support non-tensor attrs in __tensor_flatten__ (#176457)
The framework's __tensor_flatten__ method now supports non-tensor attributes, unlocking new distributed training patterns.
A significant update to PyTorch's internal subclassing mechanism has been merged, authored with assistance from Claude. The change modifies the `__tensor_flatten__` protocol, which is used to define how custom tensor subclasses are serialized and reconstructed. Previously, this method could only return tensor attributes. Now, its first return value can include opaque, non-tensor objects like a DeviceMesh—a key abstraction for distributed training that describes how tensors are split across devices.
These opaque objects are integrated into PyTorch's graph execution engine. They receive indices, become formal graph inputs and outputs, and are reconstructed from argument lists during subclass recreation, following the same pattern as tensor metadata. A new `OpaqueMeta` marker in the `SubclassCreationMeta.attrs` structure allows the autograd engine (specifically `process_runtime_tangent`) to correctly skip non-differentiable opaques during gradient computation.
The update required coordinated changes across multiple PyTorch subsystems to handle the new object type. Code in `subclass_parametrization`, `non_strict_utils`, FSDP's `_init_utils`, `common_utils` (for `get_untyped_storages`), `frontend_utils`, and `parametrize.py` was updated to iterate through `__tensor_flatten__` results without assuming every element is a tensor. The `parametrize.py` module now stores these opaques as plain attributes instead of parameters.
Robust testing ensures the feature works correctly in complex scenarios. Thirteen new tests in `test/test_opaque_obj_v2.py` validate backward passes with opaques, compatibility with compiled autograd, correct remapping of shared opaque references, identity guarding for JIT compilation, handling of deeply nested subclasses, integration with parametrization and the non-strict export mode, proper cache behavior, and that appropriate errors are raised for invalid value types.
- Enables tensor subclasses to embed complex, non-tensor objects (e.g., DeviceMesh) via the __tensor_flatten__ protocol.
- Opaque objects are integrated into the computation graph, becoming inputs/outputs and flowing alongside tensors for autograd.
- Required updates to 7 core subsystems (FSDP, parametrization, export) and added 13 comprehensive tests for backward passes and compilation.
Why It Matters
This unlocks more elegant and powerful custom tensor types for distributed training, model compression, and research, reducing boilerplate code.