Research & Papers

Box Thirding: Anytime Best Arm Identification under Insufficient Sampling

New 'ternary comparison' method identifies best-performing AI models without knowing total evaluation budget upfront.

Deep Dive

Researchers Seohwa Hwang and Junyong Park have introduced Box Thirding (B3), a novel algorithm designed to solve the 'Best Arm Identification' (BAI) problem under strict computational or data constraints. This is a critical challenge in machine learning, where practitioners need to identify the best-performing model, hyperparameter configuration, or treatment (an 'arm') from a large set, but cannot afford to exhaustively evaluate all options due to limited budget T (e.g., GPU time, API calls, or user experiments).

The core innovation of B3 is its iterative 'ternary comparison' process. In each round, the algorithm selects three candidate arms. It explores the best-performing one further, holds the median performer in reserve for future comparisons, and permanently discards the weakest. This creates a flexible, elimination-based search that doesn't require the total evaluation budget T to be known in advance—a key limitation of established methods like Successive Halving (SH). The authors prove B3 achieves a comparable probability of misidentifying the optimal arm to SH, even when SH is applied to a random subset of arms that fit within the unknown budget.

Empirically, B3 outperforms existing methods on tasks with limited budgets, as measured by 'simple regret' (the performance gap between the chosen arm and the true best). The paper validates the approach using the New Yorker Cartoon Caption Contest dataset, a complex domain requiring nuanced evaluation. For AI developers and researchers, this translates to more efficient model selection and hyperparameter tuning when resources are scarce, accelerating experimentation cycles without sacrificing the quality of the final chosen model.

Key Points
  • Box Thirding uses a ternary (three-way) comparison each iteration to explore, defer, or discard arms, enabling efficient search without prior budget knowledge.
  • It matches the performance of Successive Halving—which requires a predefined total budget T—when applied to a random subset of arms that fit within T.
  • Demonstrated lower 'simple regret' than existing methods under limited budgets on the New Yorker Cartoon Caption Contest dataset.

Why It Matters

Enables faster, more resource-efficient AI model selection and hyperparameter tuning, crucial for teams with limited compute budgets.