CP-MMD optimizes kernel selection for two-sample tests without overfitting
A complexity penalty finally lets you optimize kernels while preserving statistical validity.
Yijin Ni and Xiaoming Huo have introduced CP-MMD (Complexity-Penalized Maximum Mean Discrepancy), a novel framework that reframes kernel selection for two-sample tests as a model selection problem. Traditional MMD-based tests suffer from a fundamental trade-off: data-driven kernel optimization violates the i.i.d. assumption, leading to overfitting and variance collapse when using rich kernel classes, while aggregation methods are limited to finite grids. CP-MMD breaks this dichotomy by applying the two-sample uniform concentration inequality to the post-optimization MMD problem, deriving a penalty that mathematically absorbs the cost of optimization. This penalty bounds the empirical MMD by the complexity of the kernel search space, allowing direct, grid-free maximization over continuous parametric families—including scalar bandwidths, polynomial feature bandwidths, and deep neural network parameters. The authors prove that CP-MMD maximizes true test power while guaranteeing unconditional Type-I error control.
This result has immediate practical implications for researchers and practitioners in statistics and machine learning. CP-MMD enables seamless use of deep kernels in two-sample testing without post-hoc correction or restrictive discrete grids. The method matches or exceeds state-of-the-art test power across linear, polynomial-feature, and deep regimes, all while preserving the statistical guarantees necessary for hypothesis testing. By unifying kernel selection with model selection principles, the approach provides a theoretically sound and computationally efficient solution for nonparametric distribution comparison, a core task in fields such as generative model evaluation, domain adaptation, and causal inference.
- CP-MMD adds a complexity penalty derived from uniform concentration inequalities to prevent overfitting during kernel optimization.
- Enables grid-free optimization over continuous kernel classes including deep network parameters, not just discrete bandwidth grids.
- Proven to maximize test power while maintaining unconditional Type-I error validity, unlike prior ratio-based criteria.
Why It Matters
A principled way to pick kernels for two-sample tests, eliminating overfitting and unlocking deep kernel use without breaking statistics.