ORBIT policy exploits unimodality in the revenue function to achieve regret \(\tilde{O}(T^{\frac{2\beta-1}{4\beta-3}} + \sqrt{dT})\)?

ORBIT policy exploits unimodality in the revenue function to achieve regret \(\tilde{O}(T^{\frac{2\beta-1}{4\beta-3}} + \sqrt{dT})\).

Proves a matching minimax lower bound for fixed dimension \(d\), establishing sharpness of the nonparametric oracle-map learning term?

Proves a matching minimax lower bound for fixed dimension \(d\), establishing sharpness of the nonparametric oracle-map learning term.

Extends to sparse high-dimensional linear utility and nonparametric Hölder utility models without distributional assumptions on contexts?

Extends to sparse high-dimensional linear utility and nonparametric Hölder utility models without distributional assumptions on contexts.

Research & Papers

ORBIT pricing policy achieves minimax optimal regret in dynamic pricing

arXiv stat.ML May 18, 2026

⚡Exploits unimodality to learn oracle price maps with near-optimal regret.

Deep Dive

A new paper from Fan et al. tackles the challenge of contextual dynamic pricing where a seller must set prices based on customer features (contexts) without knowing the underlying customer valuation function or noise distribution. The key insight: under a semiparametric model where the customer's latent value equals an unknown utility function plus noise, and assuming the tail of the noise is sufficiently smooth (Hölder \(\beta \geq 2\)) along with a revenue-geometry condition ensuring a unique interior maximum, the optimal price as a function of the utility index (the "oracle price map") is itself smooth. This unimodal structure—the revenue function has a single peak—enables efficient learning.

The authors introduce ORBIT (Oracle price map learning via Bandit convex optimizaTion), a modular policy that first constructs a scalar pilot index (e.g., an estimate of the customer's expected value), then partitions the index space into active bins, and within each bin learns a local polynomial approximation of the oracle price map using trust-region bandit convex optimization. For the baseline linear utility model, an adaptive elliptical exploration scheme generates the pilot index online without distributional assumptions on contexts. ORBIT achieves regret \(\tilde{O}(T^{\frac{2\beta-1}{4\beta-3}} + \sqrt{dT})\), and the authors prove a matching lower bound for fixed dimension \(d\), showing the nonparametric component of oracle-map learning is minimax optimal. The framework extends to high-dimensional sparse linear utility and nonparametric Hölder utility models, offering a principled approach to dynamic pricing with theoretical guarantees.

Key Points

ORBIT policy exploits unimodality in the revenue function to achieve regret \(\tilde{O}(T^{\frac{2\beta-1}{4\beta-3}} + \sqrt{dT})\).
Proves a matching minimax lower bound for fixed dimension \(d\), establishing sharpness of the nonparametric oracle-map learning term.
Extends to sparse high-dimensional linear utility and nonparametric Hölder utility models without distributional assumptions on contexts.

Why It Matters

Enables principled, provably optimal dynamic pricing in complex settings without needing exact customer valuation models.

Read Original Article

ORBIT pricing policy achieves minimax optimal regret in dynamic pricing

Why It Matters

Related Articles

Stay Ahead in AI