Generative Pseudo-Labeling for Pre-Ranking with LLMs
New AI method tackles the 'train-serving discrepancy' in recommendation systems by generating unbiased labels for unseen items.
A research team from Alibaba Group and Zhejiang University has introduced Generative Pseudo-Labeling (GPL), a novel framework designed to solve a fundamental flaw in industrial recommendation systems. The core problem, known as the 'train-serving discrepancy,' arises because pre-ranking models are trained exclusively on data from items users have already seen (exposed interactions), but during live serving, they must score thousands of candidate items, including many the user has never been shown. This mismatch creates severe sample selection bias and hurts performance, especially for niche or long-tail content. Existing debiasing techniques, like heuristic negative sampling or distilling from biased rankers, often mislabel potentially good items as negative or simply propagate the existing exposure bias.
GPL's innovation is leveraging the semantic understanding of large language models (LLMs) to generate high-quality, content-aware pseudo-labels for these unexposed items offline. The process involves generating user-specific 'interest anchors' and matching them with candidate items in a frozen semantic space, effectively aligning the training data distribution with the real-world serving scenario. This provides unbiased supervision for the pre-ranking model without introducing any computational overhead during online inference. Deployed in a large-scale production system, GPL delivered a significant 3.07% lift in click-through rate (CTR) while simultaneously improving recommendation diversity and the discovery of long-tail items—a key metric for platform health and user satisfaction. This work demonstrates a practical, scalable method to harness LLMs for a core industrial AI problem beyond direct text generation.
- Solves the 'train-serving discrepancy' in recommendation pre-ranking by using LLMs to generate unbiased pseudo-labels for unexposed items.
- Achieved a 3.07% click-through rate (CTR) improvement in a large-scale production deployment while boosting diversity and long-tail discovery.
- Operates entirely offline by generating user interest anchors and performing semantic matching, adding zero latency to the live serving system.
Why It Matters
Directly improves revenue-critical metrics for major platforms and provides a scalable blueprint for using LLMs to solve core industrial ML problems.