Research & Papers

When & How to Write for Personalized Demand-aware Query Rewriting in Video Search

New framework uses LLMs to rewrite ambiguous queries, cutting user re-searches by 2.97%.

Deep Dive

A new research paper introduces WeWrite, a novel AI framework designed to make video search engines smarter by understanding user intent. The system tackles a core problem: vague search queries like 'that funny cat video' are hard for algorithms to match without knowing a user's personal history. Traditional methods that use implicit behavioral signals often suffer from diluted data and slow feedback loops, leading to poor results.

WeWrite's technical approach is threefold. First, it solves 'When to Write' by using an automated, posterior-based mining strategy to sift through user logs, identifying only the high-quality scenarios where personalization is truly necessary. Second, for 'How to Write,' it employs a hybrid training paradigm. This combines Supervised Fine-Tuning (SFT) with a novel Group Relative Policy Optimization (GRPO) method to align a large language model's (LLM) output style directly with the needs of the downstream video retrieval system. Finally, for practical 'Deployment,' the team designed a parallel 'Fake Recall' architecture to ensure the LLM-powered rewriting adds minimal latency to the search process.

The results from online A/B testing on a large-scale video platform are significant for the industry. WeWrite delivered a 1.07% increase in Click-Through Video Volume (specifically for videos watched over 10 seconds), a key engagement metric. More tellingly, it reduced the Query Reformulation Rate—how often users have to rephrase a failed search—by 2.97%, directly indicating improved user satisfaction. This work, published on arXiv, demonstrates a practical and effective blueprint for integrating personalized LLMs into real-world, latency-sensitive recommendation systems.

Key Points
  • The WeWrite framework uses an LLM trained with SFT and novel GRPO to rewrite ambiguous search queries based on user history.
  • Its 'Fake Recall' deployment architecture maintains low latency, crucial for real-time search systems.
  • Online A/B tests showed a 1.07% boost in long-play video clicks and a 2.97% reduction in users needing to rephrase searches.

Why It Matters

This provides a proven model for tech giants to deploy personalized, LLM-powered search that actually improves core metrics without slowing systems down.