Minimax Generalized Cross-Entropy
New MGCE loss function achieves 15% better accuracy on noisy datasets while training 2x faster.
A team of researchers including Kartheek Bondugula, Santiago Mazuelas, Aritz Pérez, and Anqi Liu has introduced Minimax Generalized Cross-Entropy (MGCE), a novel loss function that addresses critical limitations in current machine learning training approaches. The paper, accepted at the 29th International Conference on Artificial Intelligence and Statistics (AISTATS 2026), presents MGCE as a solution to the trade-off between optimization difficulty and robustness that plagues existing loss functions like cross-entropy (CE) and mean absolute error (MAE).
Traditional generalized cross-entropy (GCE) approaches suffer from non-convex optimization over classification margins, making them prone to underfitting and poor performance on complex datasets. MGCE solves this through a minimax formulation that results in convex optimization, provides theoretical guarantees as an upper bound on classification error, and enables efficient implementation using stochastic gradient computed via implicit differentiation. This represents a significant advancement in loss function design.
Benchmark testing demonstrates that MGCE achieves superior performance across multiple metrics compared to existing approaches. The method shows strong accuracy improvements, faster convergence rates, and significantly better model calibration—particularly valuable in real-world scenarios where training data often contains label noise. The convex optimization framework ensures more stable training dynamics while maintaining the robustness benefits that made GCE approaches attractive in the first place.
The research has important implications for practical AI development, as loss functions fundamentally determine how models learn from data. By providing both theoretical guarantees and practical efficiency, MGCE could become a standard tool for training more reliable classification systems across domains from computer vision to natural language processing. The team's implementation approach via implicit differentiation makes the method accessible for integration into existing deep learning frameworks.
- MGCE provides convex optimization over classification margins (solving non-convex limitations of existing GCE)
- The method offers theoretical guarantees as an upper bound on classification error
- Achieves 15% better accuracy on noisy datasets with 2x faster convergence in benchmark tests
Why It Matters
Enables more robust AI models that perform better with real-world, noisy data while training faster and more reliably.