TubiFM uses a 'user story' token sequence that fuses watch, search, and carousel context into one input for Llama 3.2 1B?

TubiFM uses a 'user story' token sequence that fuses watch, search, and carousel context into one input for Llama 3.2 1B.

Online A/B tests show search total viewing time up +3.9% and carousel TVT up +0.30% with a single model?

Online A/B tests show search total viewing time up +3.9% and carousel TVT up +0.30% with a single model.

Latency drops dramatically from 500ms to 200ms p99 on L40S GPUs, matching or beating production specialists?

Latency drops dramatically from 500ms to 200ms p99 on L40S GPUs, matching or beating production specialists.

Research & Papers

TubiFM unifies streaming discovery ranking, boosting search TVT by 3.9%

arXiv cs.IR May 25, 2026

⚡A single Llama 3.2 1B model replaces separate item, carousel, and search rankers.

Deep Dive

Tubi researchers have released a paper detailing TubiFM, a unified framework that streamlines streaming discovery by replacing separate item, carousel, and search ranking models with a single architecture. The core innovation is the 'user story'—a serialized representation that converts a viewer's complete history (watch events with surface and carousel context, search queries, and session attributes) into one token sequence. By interleaving pretrained language tokens with domain-specific event tokens, TubiFM expresses heterogeneous recommendation and search tasks as prompted next-token prediction over a shared grammar.

Built on Llama 3.2 1B, TubiFM achieves strong results: in offline evaluations it outperforms specialist baselines across all three ranking tasks. Online A/B tests reveal a +3.9% improvement in search total viewing time (TVT) and +0.30% in carousel TVT, while item ranking remains statistically neutral (+0.14% TVT) against a mature production stack. Critically, TubiFM reduces p99 ranking latency from 500ms to 200ms on L40S GPUs, demonstrating that a unified model can both simplify infrastructure and improve discovery metrics.

Key Points

TubiFM uses a 'user story' token sequence that fuses watch, search, and carousel context into one input for Llama 3.2 1B.
Online A/B tests show search total viewing time up +3.9% and carousel TVT up +0.30% with a single model.
Latency drops dramatically from 500ms to 200ms p99 on L40S GPUs, matching or beating production specialists.

Why It Matters

TubiFM shows that unified ranking with LLMs can simplify serving infrastructure while improving discovery metrics.

Read Original Article

TubiFM unifies streaming discovery ranking, boosting search TVT by 3.9%

Why It Matters

Related Articles

🚀 Stay Ahead in AI