Developer Tools

trunk/aa59af1d6a36351c0ba58289122533123d4fb759: [CUDA] Zero `total_weight` before accumulating in `nll_loss2d` (#182082)

A critical fix for test failures in PyTorch’s loss function on GPU

Deep Dive

PyTorch resolved PR #182082 to zero `total_weight` before accumulating in the CUDA kernel for `nll_loss2d`. The bug caused accumulation into uninitialized buffers, leading to test failures like `test_comprehensive_nn_functional_nll_loss_cuda`. The fix was authored by eqy and approved by Skylion007 and cyyever.

Key Points
  • Fix zeros `total_weight` buffer in CUDA kernel for `nll_loss2d` (PR #182082)
  • Resolves test failures in `test_comprehensive_nn_functional_nll_loss_cuda`
  • Approved by Skylion007 and cyyever; authored by eqy

Why It Matters

Ensures correct loss computation for classification models on GPUs, preventing silent gradient errors.