Research & Papers

Dual-Rerank: Fusing Causality and Utility for Industrial Generative Reranking

New framework solves the AI reranking dilemma, improving watch time while slashing inference latency.

Deep Dive

A team from the short-video giant Kuaishou has introduced Dual-Rerank, a novel framework designed to solve critical bottlenecks in deploying generative AI for search reranking at an industrial scale. Kuaishou's platform, which serves over 400 million daily active users and processes hundreds of millions of queries against tens of billions of videos, requires a final reranking stage that optimizes the entire page of results for user utility. Traditional score-and-sort methods fail here, and while generative reranking—which directly models the probability of a good result permutation—is superior, it faces a dual dilemma. The first is a structural trade-off: Autoregressive (AR) models capture sequential dependencies well but are too slow, while Non-Autoregressive (NAR) models are fast but poor at modeling those dependencies.

Dual-Rerank bridges this structural gap through a technique called Sequential Knowledge Distillation, effectively transferring the understanding of dependencies from a powerful but slow teacher AR model to a faster student NAR model. The second challenge is an optimization gap: Supervised learning struggles to optimize for holistic page utility, and Reinforcement Learning (RL) is often unstable in high-throughput environments. The framework addresses this with List-wise Decoupled Reranking Optimization (LDRO), a method designed to bring stability to online RL training. The result is a system that, according to extensive A/B testing on live Kuaishou traffic, achieves state-of-the-art performance. It significantly boosts key metrics like user satisfaction and watch time while simultaneously delivering a major reduction in inference latency compared to pure AR model baselines, proving its viability for massive, real-world applications.

Key Points
  • Solves the AR vs. NAR model trade-off for reranking using Sequential Knowledge Distillation for faster, high-quality inference.
  • Uses novel List-wise Decoupled Reranking Optimization (LDRO) to enable stable reinforcement learning for optimizing whole-page utility.
  • Proven in production at Kuaishou, improving watch time and satisfaction for 400M daily users while drastically cutting latency.

Why It Matters

This provides a blueprint for deploying high-quality generative AI in latency-sensitive, large-scale products like search and recommendation engines.