Research & Papers

New paper establishes finite-sample bounds for Median-of-Incomplete-U-Statistics

A robust estimator for symmetric kernels now comes with guaranteed concentration rates.

Deep Dive

Nong Minh Hieu's new paper, 'On Median of Incomplete U-Statistics,' presents a rigorous theoretical analysis of a robust estimator called MIU (Median-of-Incomplete-U-Statistics). MIU combines the median-of-means technique with incomplete U-statistics, which are commonly used for estimating functionals of symmetric kernels (e.g., in kernel methods, two-sample testing, and independence testing). The key contribution is the establishment of finite-sample concentration rates for MIU, providing exponential deviation bounds that hold for any sample size. This fills a gap in the literature where previous results were limited to asymptotic or distribution-specific assumptions.

For practitioners, this means MIU can be deployed with statistical confidence in real-world scenarios where data may be heavy-tailed, contaminated, or not normally distributed. The method is computationally efficient—it only requires computing a subset of U-statistics terms—and its robustness is theoretically guaranteed. This work directly impacts fields like robust machine learning, where reliable gradient estimation and kernel-based learning are crucial. By offering a principled way to trade off robustness and efficiency, Hieu's paper opens the door for wider adoption of MIU in practical pipelines.

Key Points
  • First finite-sample concentration bound for Median-of-Incomplete-U-Statistics (MIU), covering all sample sizes.
  • MIU provides robustness against heavy-tailed or contaminated data while maintaining computational efficiency.
  • Applicable to any symmetric kernel estimator, including those used in two-sample tests, independence testing, and kernel methods.

Why It Matters

Gives practitioners a theoretically sound, robust estimator for kernels—critical for reliable ML in non-ideal data.