"I don't know!": Teaching neural networks to abstain with the HALO-Loss. [R]
New open-source loss function slashes false positives by half without sacrificing base accuracy.
Researcher 4rtemi5 has open-sourced the HALO-Loss, a novel solution to a core geometry problem in neural networks. Standard Cross-Entropy (CCE) loss forces models to push features infinitely far from the origin to achieve zero loss, creating a jagged latent space with no mathematically sound place to reject uncertain inputs. This leads to models confidently hallucinating answers even when fed garbage data. HALO-Loss fixes this by using shift-invariant distance math to bound maximum confidence to a finite distance. This simple change allows the model to bolt a zero-parameter 'Abstain Class' directly to the origin of its latent space, effectively giving the network a built-in 'I don't know' button.
Testing shows HALO-Loss delivers major AI safety benefits without the typical performance trade-off. On CIFAR-10 and CIFAR-100 benchmarks, it maintained base classification accuracy (even gaining +0.23% on CIFAR-10) while dramatically improving model calibration and outlier detection. Expected Calibration Error (ECE) dropped from around 8% to a crisp 1.5%. For far Out-of-Distribution (OOD) detection—like spotting Street View House Numbers (SVHN) images mixed into CIFAR data—the False Positive Rate at 95% true positive rate (FPR@95) was slashed by more than half, from 22.08% down to 10.27%. This level of native outlier rejection, achieved without heavy model ensembles, post-hoc scoring tricks, or exposure to outlier data during training, is rare.
The technique is particularly valuable for safety-critical classification tasks and for training multi-modal models like CLIP, where a mathematically sound rejection threshold for unaligned text-image pairs is crucial. The code is available on GitHub as a drop-in replacement for standard loss functions, allowing developers to potentially reduce overconfidence and hallucinations in their own models with minimal integration effort.
- Cuts far Out-of-Distribution false positives by more than half (e.g., 22.08% to 10.27% FPR@95 on SVHN vs. CIFAR)
- Maintains base accuracy while slashing calibration error from ~8% to 1.5%
- Provides a zero-parameter 'Abstain Class' by fixing the latent space geometry of standard Cross-Entropy loss
Why It Matters
Enables safer, more reliable AI systems that can admit uncertainty instead of hallucinating, crucial for real-world deployment.