Meta's Memento uses RAG to scale ad personalization to 365+ days of history
A new system from Meta achieves 1% CTR lift by treating user history like a searchable document corpus.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Meta has unveiled Memento, a personalized retrieval-augmented generation (RAG) framework designed to overcome the limitations of traditional long-history modeling in ad recommendation. The system reframes user history as a document corpus and each ad request as a query, retrieving relevant past interactions using Maximal Marginal Relevance (MMR) — a technique that balances similarity with diversity to avoid redundant or irrelevant results. Memento supports two complementary applications: Representation Memento, which retrieves historical embeddings for feature augmentation, and Data Memento, which retrieves past training examples for multipass training. Through infrastructure co-design including temporal chunking, INT8 quantization, and asynchronous serving, Memento achieves 5-10x resource efficiency compared to naive linear scaling approaches like LastN.
In live production on Facebook Feed and Reels, Memento delivers a 1% click-through rate (CTR) lift and a 1.2% conversion rate (CVR) lift while scaling personalization to 365+ days of user history — a massive leap from typical short-window models. The system processes daily requests with sub-10ms latency and yields a 0.25-0.3% Normalized Entropy gain on both click-through and conversion prediction tasks, indicating more accurate probabilistic modeling. By treating long-term user behavior as a searchable corpus, Memento effectively solves attention dilution, system efficiency, and catastrophic forgetting problems that plague naive linear scaling. This work represents a significant industrial application of RAG techniques, demonstrating that retrieval-augmented architectures can be both efficient and effective at unprecedented scale in real-world advertising systems.
- Memento uses Maximal Marginal Relevance (MMR) to retrieve diverse and relevant user history entries for ad recommendation.
- The system achieves 5-10x resource efficiency over linear scaling with sub-10ms latency via temporal chunking and INT8 quantization.
- In production, Memento delivers a 1% CTR lift on Facebook Feed/Reels and 1.2% CVR lift, scaling personalization to 365+ days of history.
Why It Matters
Meta's Memento proves that RAG can power ads personalization over a full year of data, boosting relevance without prohibitive compute costs.