Research & Papers

Not All Candidates are Created Equal: A Heterogeneity-Aware Approach to Pre-ranking in Recommender Systems

New method deployed on Toutiao for 9 months improves user engagement by 0.4% without extra compute cost.

Deep Dive

A team from ByteDance's Toutiao platform has published a significant paper on arXiv detailing HAP (Heterogeneity-Aware Adaptive Pre-ranking), a novel framework designed to solve core inefficiencies in the pre-ranking stage of large-scale recommender systems. The research identifies a critical flaw in current methods: they indiscriminately mix heterogeneous training samples (from retrieval, ranking, and user feedback), leading to 'gradient conflicts' where hard samples dominate training while easy ones are underutilized. This results in suboptimal model performance. HAP directly tackles this by disentangling easy and hard samples and directing each subset along dedicated optimization paths with tailored loss functions.

Technically, HAP's innovation is a two-pronged adaptive approach. First, it mitigates gradient conflicts through conflict-sensitive sampling and specialized loss design. Second, and more practically, it allocates computational budgets intelligently across candidates. It applies lightweight models to all candidates for broad, efficient coverage, then engages more complex, accurate models only on the identified hard cases. This maintains or improves ranking accuracy while reducing overall computational cost—a major concern for platforms serving billions of recommendations. The framework has been in production on Toutiao for nine months, yielding measurable gains: up to a 0.4% improvement in user app usage duration and a 0.05% increase in active days, all without increasing infrastructure costs. The team is also releasing a large-scale industrial dataset to spur further research into this critical system component.

Key Points
  • Solves 'gradient conflicts' by separating easy/hard training samples with tailored optimization paths.
  • Adaptively allocates compute: lightweight models for all candidates, stronger models only for hard cases.
  • Deployed on Toutiao for 9 months, boosting user engagement by 0.4% with zero added computational cost.

Why It Matters

Provides a scalable blueprint for making billion-user recommender systems more accurate and efficient simultaneously.