Research & Papers

Adaptive Robust Confidence Intervals in Efron's Gaussian Two-Groups Model

New algorithm certifies intervals even when the fraction of corrupted data is unknown.

Deep Dive

A new theoretical paper on arXiv tackles a critical problem in robust statistics: constructing confidence intervals when some observations may be arbitrarily corrupted. The authors study Efron's Gaussian two-groups model, where an unknown fraction ε of data points have shifted means, but all share the same Gaussian measurement noise with variance σ². This 'noise-oblivious' adversary setting reflects real-world scenarios where contamination occurs upstream of measurement.

The paper's main contribution is a precise characterization of the minimax-optimal confidence interval length that adapts to unknown ε. When σ² is known, the optimal length scales as σ(n^{-1/4} + ε^{1/2}/max{1, log(en ε²)}^{1/2}), which is polynomially worse than when ε is known. If σ² is also unknown, the length further degrades to at least Ω(σ n^{-1/8}) — no adaptive robust interval can be shorter. The authors also introduce a Fourier-based certification procedure leveraging Carathéodory's positive-semidefiniteness constraints, which runs in polynomial time and achieves the minimax lower bound for known variance.

Key Points
  • The minimax-optimal confidence interval length when contamination proportion ε is unknown is polynomially larger than when ε is known.
  • With unknown variance σ², the interval length lower bound becomes Ω(σ n^{-1/8}), a severe degradation.
  • A polynomial-time Fourier certification algorithm using Carathéodory constraints attains the minimax bound for known σ².

Why It Matters

Enables reliable uncertainty quantification in data analysis even when both contamination rate and variance are unknown.