Research & Papers

Policy-Aware Design of Large-Scale Factorial Experiments

New two-stage design uses tensor completion to find optimal product combinations with limited traffic.

Deep Dive

A team of researchers from academia has published a new paper titled "Policy-Aware Design of Large-Scale Factorial Experiments," addressing a critical bottleneck in digital product development. When companies like Meta, Google, or Amazon test multiple product features simultaneously (e.g., interface elements, messages, incentives), the number of possible combinations explodes combinatorially. Traditional decentralized A/B testing struggles with this complexity, as overlapping experiments on shared user populations generate unmeasured interaction effects that can corrupt results and waste precious experimentation traffic.

The proposed solution is a centralized, two-stage design that frames the problem as a single, large factorial experiment. In Stage 1, the platform tests only a strategic subset of all possible intervention combinations. It then uses tensor completion—a machine learning technique for filling in missing data in multi-dimensional arrays—to infer the performance of untested combinations and eliminate weak options. Stage 2 applies an algorithm called sequential halving to the remaining high-potential combinations to efficiently identify the single best policy. The method's complexity scales with the underlying structure of the problem (the low-rank tensor's degrees of freedom) rather than the full, immense factorial size, making it computationally feasible.

The researchers validated their approach with an offline evaluation on a product-bundling problem constructed from 100 million real user interactions from Alibaba's Taobao platform. The results showed their policy-aware design substantially outperformed both one-shot tensor completion and standard best-arm identification benchmarks. The performance gains were most pronounced in realistic, constrained scenarios: low experimentation budgets and high-noise settings. This demonstrates a path for platforms to make data-driven, combinatorial product design—testing many feature combinations at once—operationally feasible at massive scale.

Key Points
  • Centralizes overlapping A/B tests into a single factorial problem using a low-rank tensor model to manage combinatorial explosion.
  • Uses a two-stage process: tensor completion to infer untested combinations & sequential halving to select the final optimal policy.
  • Outperformed benchmarks in a Taobao case study with 100M interactions, showing strong gains in low-budget, high-noise scenarios.

Why It Matters

Enables tech giants to efficiently find optimal product configurations from billions of combinations, accelerating innovation while conserving user traffic.