Scaled from 2M to 1B backbone parameters with task-dependent scaling ceilings observed via offset scaling-law fits?

Scaled from 2M to 1B backbone parameters with task-dependent scaling ceilings observed via offset scaling-law fits.

Introduced multi-token prediction, sampled softmax, and semantic item towers to address serving latency, training efficiency, and cold-start problems?

Introduced multi-token prediction, sampled softmax, and semantic item towers to address serving latency, training efficiency, and cold-start problems.

In a 1M-user shadow evaluation, the 1B model outperformed the 2M baseline on MRR across all downstream recommendation tasks?

In a 1M-user shadow evaluation, the 1B model outperformed the 2M baseline on MRR across all downstream recommendation tasks.

Research & Papers

Netflix scales generative recommender to 1B parameters with production gains

arXiv cs.IR May 25, 2026

⚡How Netflix scaled a generative recommender from 2M to 1B parameters

Deep Dive

Netflix researchers, in a paper published on arXiv, detail their experience scaling a generative recommendation model from 2 million to 1 billion backbone parameters (excluding embeddings and decoding) in a production title recommendation setting. They observed task-dependent scaling behavior: some downstream tasks approached an empirical ceiling within the tested range, while others continued to benefit from added capacity. To diagnose where additional scale is useful, they propose using offset scaling-law fits. This insight is critical for allocating resources efficiently.

Production constraints forced further innovations: frequent retraining over trillions of behavior tokens demanded better training and decoding efficiency; cached serving made next-token targets stale; and new titles required scoring from semantic metadata before collaborative ID embeddings are reliable. The team addressed these with multi-token prediction for serving-latency alignment, sampled softmax with a projected decoding head for efficient repeated training, and semantic item towers with collaborative-embedding masking for cold-start adaptation. In a one-week production-shadow evaluation over 1 million users, the 1B-backbone model achieved higher Mean Reciprocal Rank (MRR) than the 2M baseline across all reported tasks, validating their approach.

Key Points

Scaled from 2M to 1B backbone parameters with task-dependent scaling ceilings observed via offset scaling-law fits.
Introduced multi-token prediction, sampled softmax, and semantic item towers to address serving latency, training efficiency, and cold-start problems.
In a 1M-user shadow evaluation, the 1B model outperformed the 2M baseline on MRR across all downstream recommendation tasks.

Why It Matters

Practical scaling insights for generative recommenders, balancing model size with real-world production constraints.

Read Original Article

Netflix scales generative recommender to 1B parameters with production gains

Why It Matters

Related Articles

🚀 Stay Ahead in AI