Developer Tools

viable/strict/1777182369: Skip test_conv_error when TORCH_SHOW_CPP_STACKTRACES=1 (#181142)

A hidden env variable was breaking ARM64 builds by leaking stack traces into user errors.

Deep Dive

PyTorch has resolved a subtle test flakiness issue in its CI pipeline, specifically affecting `test_conv_error` on ARM64 architectures. The problem stemmed from the environment variable `TORCH_SHOW_CPP_STACKTRACES=1`, which is used by the test runner on retry attempts. When set, the JIT interpreter's `handleError` function appends the C++ backtrace (via `e.what()`) to the user-visible RuntimeError message from a failed scripted conv2d call. The test assertion was designed to ensure the word 'frame' does not appear in that error message, guarding against internal backtrace leaks.

Previously, this issue was masked by an `@xfailIf(IS_ARM64)` decorator, which expected the test to fail on ARM. However, on newer m8g runners where the test no longer flaked unexpectedly, the decorator started passing (XPASS), breaking the aarch64 job. The fix replaces the architecture-specific xfail with a clean environment-based skip that only activates when `TORCH_SHOW_CPP_STACKTRACES=1` is set. This eliminates the spurious failures and simplifies the test logic by removing the now-unused `IS_ARM64` import. The pull request was authored with assistance from Claude and approved by bobrenjc93, resolving issue #177255.

Key Points
  • `TORCH_SHOW_CPP_STACKTRACES=1` caused 'frame' to appear in RuntimeError messages from scripted conv2d, breaking a test assertion.
  • Previous workaround using `@xfailIf(IS_ARM64)` failed on m8g runners (XPASS), breaking the aarch64 CI job.
  • Fix replaces architecture-specific xfail with an environment-based skip, removing the `IS_ARM64` import entirely.

Why It Matters

Ensures stable PyTorch CI across ARM64 architectures, preventing spurious test failures that delay development.