Research & Papers

New adaptive bandit algorithms minimize regret in contextual matching markets

Subtle context shifts can wreck regret; new method achieves poly-log bounds.

Deep Dive

Matching markets like job boards or dating apps face a fundamental learning problem: each round brings new candidates (arms) with observable context (e.g., skills, profile), and the platform must match them to players (e.g., employers, users) to maximize long-term utility. Existing bandit algorithms struggle because even small context shifts can completely reconfigure the stable matching benchmark, causing large regret spikes for some players. This paper from an academic team tackles the problem in two settings: stochastic contexts (drawn from a latent distribution) and adversarial contexts (potentially arbitrary).

For the stochastic case, the authors introduce a novel minimum preference gap metric to quantify learning difficulty and provide a fully adaptive algorithm that achieves an instance-dependent poly-logarithmic regret upper bound—a significant improvement over prior exponential or linear bounds. They also establish matching instance-independent upper and lower bounds under a mild distributional assumption. For adversarial contexts, they propose a tractable regret notion that remains valid under arbitrary context sequences and achieves sublinear regret via an adaptive algorithm. Accepted to ICML 2026, this work provides both theoretical foundations and practical algorithms for dynamic matching platforms.

Key Points
  • Introduces a minimum preference gap to quantify learning difficulty in stochastic matching markets.
  • Achieves instance-dependent poly-logarithmic regret upper bound for stochastic contexts.
  • Provides sublinear regret for adversarial contexts with a novel tractable regret notion.

Why It Matters

Enables more robust matching algorithms for platform economies, reducing user regret despite unpredictable context shifts.