Research & Papers

Kuaishou's RGCD-Rep uses MLLMs to bridge short videos and live streams

400M daily users benefit from cross-domain recommendation with reasoning-guided AI.

Deep Dive

A team from Kuaishou has published a paper detailing RGCD-Rep (Reasoning-Guided Cross-Domain Representation Learning), a framework that leverages multimodal large language models (MLLMs) to improve live stream recommendations by transferring user interest signals from short videos. The challenge: live streaming has sparse user interaction data (cold start), while short videos are rich in behavior signals. RGCD-Rep solves this in two stages. First, a frozen teacher MLLM generates structured reasoning about cross-domain item relationships, which is distilled into a lightweight student MLLM. Second, item representations are decomposed into transferable and domain-residual components, learned with behavioral collaboration.

Offline experiments showed significant gains, and A/B tests in Kuaishou's live streaming system confirmed improvements across core metrics. The model now serves over 400 million daily users. This marks one of the first large-scale industrial deployments of MLLM-guided cross-domain recommendation, demonstrating that reasoning-driven representation learning can bridge content modalities efficiently without incurring heavy inference costs at serving time.

Key Points
  • RGCD-Rep uses a frozen teacher MLLM to generate cross-domain reasoning, distilled into a lightweight student MLLM for efficient deployment.
  • Item representations are split into transferable and domain-residual parts, enabling knowledge transfer from short videos to live streams.
  • Deployed at Kuaishou, serving 400M+ daily users with measurable gains in A/B tests across multiple business metrics.

Why It Matters

Makes live stream recommendations more effective by leveraging rich short-video behavior, solving cold start at scale.