Research & Papers

Rec-Distill transfers large-scale recommendation models to lightweight serving with 60%+ gains

Teachers up to 24B parameters, students recover >60% performance — in production.

Deep Dive

Large recommendation models show promising scaling laws, but industrial deployment demands lightweight models with strict latency and efficiency constraints. Rec-Distill bridges this gap by distilling massive teacher models (up to 24B parameters, 20K behavior sequence length) into efficient student models for online serving. The pipeline combines decoupled training, black-box distillation, debiasing mechanisms, and a hybrid batch-streaming architecture to handle dynamic recommendation environments.

In real-world tests across multiple recommendation and advertising scenarios, the framework achieves distillation transferability exceeding 60%, meaning students recover a substantial portion of teacher performance gains. These improvements translate into measurable business metrics, establishing Rec-Distill as a practical path to scaling recommendation models while meeting deployment constraints. The work was submitted to arXiv on May 28, 2026 by Haoran Ding and 17 co-authors.

Key Points
  • Teachers scaled to 24B dense parameters and 20K behavior sequence length
  • Distillation transferability >60% in best setting across real-world platforms
  • Hybrid batch-streaming pipeline enables dynamic recommendation environments

Why It Matters

Brings scaling-law benefits of massive recommendation models to production without sacrificing latency or cost.