Research & Papers

Wavelet-based DPP kernels improve minibatch accuracy and speed

New DPP subsampling method cuts variance and works with rough objectives.

Deep Dive

In a new paper on arXiv, researchers Hoang-Son Tran, Pranav Gupta, Rémi Bardenet, and Subhroshekhar Ghosh address two major challenges in using determinantal point processes (DPPs) for efficient subsampling in machine learning. DPPs are known to produce diverse, representative minibatches and coresets, but existing methods suffer from a scarcity of continuous DPP families with strong variance-reduction properties, and require ad-hoc discrete constructions that often lose those guarantees. The team solves both problems by introducing novel DPP kernels built on wavelets. These continuous DPPs achieve tighter accuracy bounds than any prior method, and the authors also develop a general conversion technique that turns any such continuous DPP into a discrete kernel while preserving its variance-decay properties. Crucially, the discrete kernel admits a low-rank decomposition, enabling fast, inexpensive sampling even for very large datasets.

The work goes further by enlarging the class of objective functions that can benefit from DPP-based subsampling. Many real-world ML tasks involve loss functions with low regularity (e.g., non-smooth or piecewise-defined objectives), which previous DPP approaches could not handle. The new method provides rate guarantees that adapt explicitly to the regularity of the objective, making it applicable to a much broader range of problems. With this advancement, DPPs become a more practical and powerful tool for constructing efficient minibatches and coresets, promising significant speedups and memory savings in training large models without sacrificing accuracy. The paper is published as arXiv:2605.13127.

Key Points
  • New continuous DPP kernels based on wavelets achieve provably better accuracy guarantees than existing DPP methods.
  • A general technique converts continuous DPPs to discrete kernels while preserving variance reduction and enabling low-rank sampling.
  • The approach extends DPP benefits to objective functions with arbitrarily low regularity, with adapted rate guarantees.

Why It Matters

Enables faster, cheaper minibatches and coresets for large-scale ML, especially with non-smooth objectives.