Rec-Distill transfers large-scale recommendation models to lightweight serving with 60%+ gains
Teachers up to 24B parameters, students recover >60% performance — in production.
Large recommendation models show promising scaling laws, but industrial deployment demands lightweight models with strict latency and efficiency constraints. Rec-Distill bridges this gap by distilling massive teacher models (up to 24B parameters, 20K behavior sequence length) into efficient student models for online serving. The pipeline combines decoupled training, black-box distillation, debiasing mechanisms, and a hybrid batch-streaming architecture to handle dynamic recommendation environments.
In real-world tests across multiple recommendation and advertising scenarios, the framework achieves distillation transferability exceeding 60%, meaning students recover a substantial portion of teacher performance gains. These improvements translate into measurable business metrics, establishing Rec-Distill as a practical path to scaling recommendation models while meeting deployment constraints. The work was submitted to arXiv on May 28, 2026 by Haoran Ding and 17 co-authors.
- Teachers scaled to 24B dense parameters and 20K behavior sequence length
- Distillation transferability >60% in best setting across real-world platforms
- Hybrid batch-streaming pipeline enables dynamic recommendation environments
Why It Matters
Brings scaling-law benefits of massive recommendation models to production without sacrificing latency or cost.