Research & Papers

LERA uses LLMs to refine ad auctions in generative chatbots with two-stage RAG

New framework uses LLM logits to score ad relevance, slashing repetitive inserts in chatbot responses.

Deep Dive

Monetizing generative chatbots through advertising is a massive opportunity, but current approaches stumble. Existing retrieve-then-generate paradigms rely solely on text embedding similarity for ad retrieval, often causing commercial misinterpretation and repetitive ad insertions that degrade user experience.

To solve this, researchers from Alibaba and Peking University introduce LERA—a two-stage auction framework that injects LLM intelligence directly into the scoring process. The first stage uses lightweight embedding-based coarse filtering to narrow candidates. The second stage prompts the LLM itself to produce logits over those candidates, generating organic relevance scores that reflect deep semantic understanding. These scores are combined with advertiser bids under a novel critical-value payment rule that accounts for both filtering thresholds, guaranteeing truthfulness for utility-maximizing advertisers. The framework also extends to multiple ad insertions within dynamic dialogue flows. Experiments on a synthetic benchmark show LERA substantially improves ad selection accuracy and insertion diversity while adding only controllable latency overhead.

Key Points
  • Two-stage design: embedding-based coarse filtering followed by LLM fine-ranking using logits over candidates.
  • Critical-value payment rule ensures truthfulness by accounting for both coarse-filtering and fine-ranking thresholds.
  • Experiments show significant gains in ad selection accuracy and insertion diversity with minimal latency impact.

Why It Matters

Better ad targeting in chatbots means higher revenue for platforms and less intrusive ads for users.