Research & Papers

Sevetlidis Defines Bayes-Sufficient Representations for Minimal Optimal Prediction

New theory reveals exactly how much information a model needs for optimal predictions.

Deep Dive

Vasileios Sevetlidis's new paper, 'Bayes-Sufficient Representations in Supervised Learning,' provides a rigorous theoretical framework for understanding what information a learned representation must retain for optimal prediction. The core idea: a representation is Bayes-sufficient for a given loss and joint distribution if there exists a prediction head that can use it to implement a Bayes-optimal action. Crucially, the required information depends on the loss function. The paper introduces the Bayes quotient: a partition of the input space where inputs that require the same Bayes-optimal action are grouped together. A representation is sufficient if it refines this quotient, and Bayes-minimal if it retains exactly the quotient's information—no more, no less.

The framework connects naturally to property elicitation: zero-one loss requires the Bayes class, squared loss the conditional mean, and log loss the full predictive distribution. Controlled finite experiments, learned neural bottleneck experiments, and a real-world iNaturalist taxonomic refinement task demonstrate how to measure whether a representation is sufficient, minimal, or contains unnecessary information. This work offers a principled way to design representation learning methods that discard irrelevant information while preserving everything needed for optimal decision-making under a specific loss.

Key Points
  • Representation is Bayes-sufficient if a prediction head can achieve Bayes-optimal risk for a given loss function.
  • Bayes quotient partitions inputs by optimal action; minimal representation is informationally equivalent to this quotient.
  • Different losses require different sufficient statistics: zero-one loss requires class probabilities, squared loss requires conditional mean, log loss requires full predictive distribution.

Why It Matters

This theory provides a precise guide for building efficient representations, reducing wasted computation in neural networks.