New paper establishes finite-sample bounds for Median-of-Incomplete-U-Statistics
A robust estimator for symmetric kernels now comes with guaranteed concentration rates.
Nong Minh Hieu's new paper, 'On Median of Incomplete U-Statistics,' presents a rigorous theoretical analysis of a robust estimator called MIU (Median-of-Incomplete-U-Statistics). MIU combines the median-of-means technique with incomplete U-statistics, which are commonly used for estimating functionals of symmetric kernels (e.g., in kernel methods, two-sample testing, and independence testing). The key contribution is the establishment of finite-sample concentration rates for MIU, providing exponential deviation bounds that hold for any sample size. This fills a gap in the literature where previous results were limited to asymptotic or distribution-specific assumptions.
For practitioners, this means MIU can be deployed with statistical confidence in real-world scenarios where data may be heavy-tailed, contaminated, or not normally distributed. The method is computationally efficient—it only requires computing a subset of U-statistics terms—and its robustness is theoretically guaranteed. This work directly impacts fields like robust machine learning, where reliable gradient estimation and kernel-based learning are crucial. By offering a principled way to trade off robustness and efficiency, Hieu's paper opens the door for wider adoption of MIU in practical pipelines.
- First finite-sample concentration bound for Median-of-Incomplete-U-Statistics (MIU), covering all sample sizes.
- MIU provides robustness against heavy-tailed or contaminated data while maintaining computational efficiency.
- Applicable to any symmetric kernel estimator, including those used in two-sample tests, independence testing, and kernel methods.
Why It Matters
Gives practitioners a theoretically sound, robust estimator for kernels—critical for reliable ML in non-ideal data.