Research & Papers

PARWiS: Winner determination under shoestring budgets using active pairwise comparisons

New AI system identifies top choices with minimal human input, cutting comparison costs by 60%.

Deep Dive

Researcher Shailendra Bhandari has introduced PARWiS, a novel algorithm designed to solve the challenging problem of winner determination under constrained comparison budgets. The system employs active pairwise comparisons combined with spectral ranking techniques to identify the best item among a set with minimal human input. This addresses a critical need in preference-based learning where exhaustive comparisons are impractical due to time or cost constraints. The research extends the core PARWiS algorithm with two variants: Contextual PARWiS, which incorporates additional feature information, and RL PARWiS, which uses reinforcement learning for pair selection. These were evaluated against established baselines including Double Thompson Sampling across synthetic and real-world datasets like Jester and MovieLens.

The technical evaluation used extremely tight budgets of just 40, 60, and 80 comparisons to rank 20 items, measuring performance through recovery fraction, true rank of reported winner, and cumulative regret. Results showed PARWiS and RL PARWiS consistently outperformed baseline methods, particularly excelling on the Jester dataset where the separation between top items (Δ₁,₂) was larger. On the more challenging MovieLens dataset with smaller separation values, performance gaps narrowed but PARWiS maintained advantages. Interestingly, Contextual PARWiS showed comparable rather than superior performance to the base algorithm, suggesting contextual features may require more sophisticated integration. The algorithm's efficiency makes it immediately applicable to real-world scenarios like product testing, content recommendation, and survey design where minimizing human judgment calls is economically crucial.

Key Points
  • PARWiS algorithm identifies best items using only 40-80 pairwise comparisons for sets of 20 items
  • Outperforms Double Thompson Sampling baseline by 15-25% on recovery fraction metrics in Jester dataset tests
  • RL-based variant shows strongest performance, while contextual features require further tuning for optimal results

Why It Matters

Dramatically reduces cost and time for preference testing in e-commerce, content recommendation, and product development.