Scaling search CVR models yields +2.6% gain with 8x compute at same latency
2.5x more data and 8x inference compute with zero latency penalty
A team of researchers (James Pak and 15 others) has published a detailed study on scaling search conversion rate (CVR) prediction models for high-traffic e-commerce platforms. Using over a year of customer interaction logs from a production system, they systematically analyzed how three key factors—backbone computation, embedding parameters, and training data volume—affect model quality and scalability. Their key finding: the effects of scaling each factor are largely independent and additive, enabling more efficient exploration.
To deploy the final model, the team employed a warmstart strategy to accelerate training iterations, plus inference optimizations like decoupled graph execution and dynamic batching to keep GPU latency minimal. The result: a model trained on 2.5x more data with 8x more inference compute, deployed with negligible latency impact. Online A/B tests showed a +2.6% improvement in search conversion rate over the pre-scaling production baseline. The paper provides a practical roadmap for any team looking to scale large-scale CVR models without sacrificing serving speed or budget.
- Scaling backbone, embedding, and data independently yields additive quality gains, simplifying optimization.
- Warmstart strategy cuts training iteration time while easing model updates.
- Decoupled graph execution and dynamic batching enable 8x more inference compute with minimal latency impact.
Why It Matters
E-commerce teams can dramatically boost conversion rates without sacrificing serving speed, using practical scaling techniques validated in production.