Research & Papers

New Framework Tightens Conformal Prediction with Beta Laws and Wasserstein Distances

Finite-sample beta distribution provides exact calibration coverage diagnostics beyond marginal averages.

Deep Dive

Traditional split conformal prediction guarantees marginal coverage over random calibration samples, but practitioners often need to understand coverage given a realized threshold. This paper by Ramos, Graziadei, and Cabezas (arXiv:2605.19024) derives the exact finite-sample distribution of calibration-conditional coverage under continuous i.i.d. data: it follows a Beta(k, n+1-k) distribution, where k is the number of calibration points and n is total sample size. The authors treat this beta law as a reference object and use Wasserstein distances to measure how different data-generating processes deform it.

The framework provides direct bounds on marginal coverage gaps and bad-calibration probabilities, isolating two sources of non-i.i.d. behavior: test-side shift acts through a transport map on the coverage scale, while calibration dependence changes the order-statistic law itself. The approach is instantiated for scale-shift, clustered, and stationary mixing settings, with explicit characterizations or Berry-Esseen approximations. Simulations on dependent processes show the first-order approximation tracks empirical Wasserstein distances even at moderate sample sizes, offering a practical tool for uncertainty quantification in machine learning.

Key Points
  • Calibration-conditional coverage in i.i.d. settings follows an exact Beta(k, n+1-k) distribution, providing a finite-sample reference.
  • Wasserstein distances on [0,1] measure departures from this beta law, yielding bounds on marginal coverage gaps and bad-calibration probabilities.
  • Framework separates test-side shift (transport map) from calibration dependence, with explicit characterizations for scale-shift, clustered, and stationary mixing processes.

Why It Matters

Delivers precise calibration diagnostics for machine learning, improving the reliability of uncertainty quantification in real-world deployment.