Xiaohongshu's UniNote model achieves SOTA in multimodal I2I retrieval
New embedding model from Xiaohongshu beats baselines with RL fine-tuning and MRL.
Researchers from Xiaohongshu (the Chinese lifestyle platform) have unveiled UniNote, a unified embedding model designed specifically for industrial item-to-item (I2I) retrieval. I2I — finding similar items from a catalog — is foundational for recommendation engines, content moderation, and search. Existing multimodal embedding methods struggle to balance global content representation with fine-grained local retrieval, and often suffer from inefficiency in decoupled embedding-and-ranking pipelines. UniNote addresses these gaps with tailored retrieval strategies that handle complex, multimodal content at multiple granularities.
UniNote's key innovation is a two-stage training paradigm. The first stage uses contrastive supervised fine-tuning (SFT) to build robust base embeddings. The second stage applies reinforcement learning (RL) to refine ranking quality by aligning the model directly with content relevance — a technique that moves beyond static loss functions toward dynamic optimization. When deployed at Xiaohongshu and integrated with Matryoshka Representation Learning (MRL), UniNote delivered substantial improvements in retrieval quality and cost efficiency. The model achieved state-of-the-art performance across a diverse set of I2I benchmarks. The paper has been accepted to the KDD Ads Track 2026.
- UniNote uses a two-stage training: contrastive SFT followed by RL to align embeddings with content relevance.
- Integrated with Matryoshka Representation Learning (MRL) for flexible, cost-efficient serving at scale.
- Achieves SOTA on multiple I2I retrieval tasks, deployed on Xiaohongshu's production platform.
Why It Matters
A production-ready embedding model that balances precision, latency, and cost for real-world recommendation systems.