Developer Tools

trunk/0290cd45c550376bcdc10821e41c13488df8d6a3: Preserve AOTI proxy_executor error messages (#180884) (#180884)

No more generic 'run failed' errors—real exception messages survive the ABI boundary now.

Deep Dive

A new commit to PyTorch's trunk branch addresses a persistent debugging headache for AOTInductor users: error messages from custom ops thrown during proxy_executor calls were being swallowed and replaced with a generic 'AOTInductorModel run failed with input spec' message. The fix, merged by contributor yingufan in pull request #180884, introduces thread-local error storage in the AOTI shim layer so that the original exception message survives the C ABI boundary. Specifically, a thread-local variable `aoti_last_error_msg` is stored in `shim_common.cpp` and populated by the `AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE` macro. Getter and setter functions are declared in `utils.h` (internal to libtorch, not part of the stable C ABI in `shim.h`).

Model container runners (`model_container_runner.cpp` and `AOTInductorModelImpl.cpp`) now read the stored error via `aoti_torch_get_last_error()` to propagate the original message instead of a generic placeholder. Additionally, `cpp_wrapper_cpu.py` wraps proxy_executor calls in `AOTI_TORCH_ERROR_CODE_CHECK` so that errors are not silently ignored. The test plan includes running `buck test` commands for `assert_tensor_test` and `test_proxy_executor_error_message_preserved`. This change, approved by PyTorch maintainer desertfire, significantly improves debuggability for developers using AOTInductor with custom operators, especially in production inference pipelines where opaque error messages can stall troubleshooting.

Key Points
  • Thread-local `aoti_last_error_msg` in shim_common.cpp stores the original exception from custom ops.
  • Model runners read the preserved error via `aoti_torch_get_last_error()` instead of generic failure messages.
  • cpp_wrapper_cpu.py now wraps proxy_executor calls with `AOTI_TORCH_ERROR_CODE_CHECK` to prevent silent errors.

Why It Matters

Fixes silent error swallowing in AOTInductor, making debugging custom ops in production pipelines faster and clearer.