Research & Papers

Shekhar & Howard's new ad auction framework certifies policies, cuts 19 to 2 candidates

47.66% replay lift plus safety certification across 44 segments — offline ad evaluation just got smarter

Deep Dive

Existing replay and off-policy evaluation methods for logged ad auctions estimate or rank policy values, but they risk hiding weak threshold support, multiple-comparison effects, subgroup harm, and bidder-response uncertainty. Shekhar and Howard’s support-aware offline decision framework directly answers whether the available evidence is strong enough to justify validation. Rather than a single point-estimate winner, the framework outputs a conservative decision object consisting of certified policies, statistically dominated alternatives, and unresolved candidates that require further validation. The main theoretical result gives a unified finite-catalog guarantee: under simultaneous uncertainty control and conservative support gates, the framework preserves the best gate-passing policy while eliminating only policies with certified regret. Supporting results characterize support-localized replay generalization, establish information-theoretic threshold-resolution limits, and quantify when heterogeneous bidder response can overturn localized replay rankings.

Experiments on iPinYou real-time-bidding logs demonstrate the framework’s practical value. The leading reserve rule achieves a 47.66% replay lift in season two, a 40.71% simultaneous lower-bound lift, and a 43.87% frozen out-of-time replay lift in season three. The framework reduces a 19-policy catalog to just a two-policy validation shortlist while certifying non-harm across all 44 advertiser, exchange, and region segments. These results support the central claim that offline reserve-policy evaluation should produce certified validation decisions rather than point-estimate rankings alone. The approach offers marketplace operators a principled way to reduce risk and increase trust when selecting reserve prices from logged auction data.

Key Points
  • Reduces a 19-policy catalog to a 2-policy shortlist with certified non-harm across 44 advertiser/exchange/region segments
  • Achieves 47.66% replay lift (season two), 40.71% lower-bound lift, and 43.87% out-of-time replay lift (season three) on iPinYou logs
  • Replaces single point-estimate ranking with a conservative decision object: certified policies, dominated alternatives, and unresolved candidates

Why It Matters

Safer ad marketplace reserve pricing with certified decisions reduces risk of hidden bidder harm and validation failures