CARD: Non-Uniform Quantization of Visual Semantic Unit for Generative Recommendation
New AI method unifies text, images, and signals for better product suggestions...
A team of researchers from China has introduced CARD, a novel generative recommendation framework designed to improve how systems learn Semantic IDs (SIDs) for items like products or content. In generative recommendation, items are typically represented as discrete SIDs, but existing methods struggle with two persistent issues: insufficient supervision during the two-stage process of SID construction and autoregressive generation, and non-uniform embeddings that cause codeword imbalance and generation bias. CARD addresses these by first introducing a visual semantic unit that merges textual, visual, and collaborative signals into a structured visual representation before encoding, enabling more holistic semantic modeling and reducing reliance on supervision signals.
To handle the highly non-uniform distribution of item embeddings common in recommendation scenarios, CARD employs a non-uniform quantization framework called NU-RQ-VAE. This framework incorporates a learnable, invertible non-uniform transformation into the quantization process, mapping skewed semantic distributions into a more balanced latent space. This significantly improves codebook utilization and quantization accuracy. Experiments on multiple datasets show CARD consistently outperforms baseline methods, and the non-uniform transformation module is plug-and-play, remaining robust across different quantization schemes. The code is available on GitHub.
- CARD unifies textual, visual, and collaborative signals into a visual semantic unit before encoding.
- NU-RQ-VAE uses a learnable non-uniform transformation to balance skewed item embeddings.
- Outperforms baselines across multiple datasets; module is plug-and-play and robust.
Why It Matters
Better recommendations for e-commerce and content platforms via more accurate, balanced item representations.