Research & Papers

Mean Testing under Truncation beyond Gaussian

arXiv stat.ML May 05, 2026

⚡Truncated data hides up to ε-fraction of mass, creating a sharp detectability limit.

Deep Dive

In a new paper on arXiv, researchers Yuhao Wang, Roberto Imbuzeiro Oliveira, and Themis Gouleakis tackle an important but overlooked problem in statistics: how to test the mean of a distribution when some data is missing due to an unknown truncation mechanism. They consider a setting where samples are drawn from the conditional distribution given an unknown truncation set S that may hide up to an ε-fraction of the probability mass. This creates a systematic bias that degrades the ability to distinguish whether the true mean is zero (null) or some nonzero value α (alternative).

The team derives information-theoretic limits: they show that when the signal α falls below a bias threshold of order O(ν ε^(1-1/p)), where ν bounds the p-th directional moments, the null and alternative are fundamentally indistinguishable no matter how many samples are collected. This establishes a sharp "detectability floor." Above that floor, they propose a simple second-order test that achieves near-optimal sample complexity n = O(||Σ_P|| / (α - 4ν ε^(1-1/p))^2 * √d), where Σ_P is the covariance matrix. This result connects finite-moment distributions to sub-Gaussian and median-regular regimes.

Perhaps most striking is their finding of a structural escape: if the distribution satisfies a directional median regularity condition, the truncation bias improves from polynomial to linear order O(ε). In this regime, testing recovers the classical Θ(√d) rate while estimation still requires Θ(d) samples. The work provides a unified framework bridging several theoretical traditions and offers practical guidance for hypothesis testing in settings with missing or truncated data—common in surveys, econometrics, and machine learning pipelines.

Key Points

Proves a sharp detectability floor: if signal α < O(ν ε^(1-1/p)), hypotheses are indistinguishable even with infinite data.
Achieves near-optimal sample complexity n = O(||Σ_P|| / (α - 4ν ε^(1-1/p))^2 * √d) above the bias threshold.
Under median regularity, bias improves to linear O(ε), enabling classical √d testing rates while estimation still requires Θ(d) samples.

Why It Matters

This provides a rigorous theoretical foundation for mean testing with truncated data, directly impacting missing-data pipelines in ML and statistics.

Read Original Article

Mean Testing under Truncation beyond Gaussian

Why It Matters

Stay Ahead in AI