Research & Papers

CRAB: Codebook Rebalancing for Bias Mitigation in Generative Recommendation

New post-hoc technique rebalances semantic codebooks to surface niche items, improving fairness.

Deep Dive

A team of researchers including Zezhong Fan, Ziheng Chen, and six others has published a paper on CRAB (Codebook Rebalancing for Bias Mitigation), a novel method designed to tackle a critical flaw in modern Generative Recommendation (GeneRec) systems. These AI models, which represent items as discrete semantic tokens and predict them generatively, have shown strong performance but suffer from severe popularity bias, often exacerbating the 'rich-get-richer' problem for popular items. The researchers identified two root causes: imbalanced tokenization that inherits bias from historical data, and training procedures that disproportionately favor popular tokens while neglecting semantic relationships.

CRAB operates as a post-hoc debiasing strategy on a well-trained GeneRec model. Its core innovation is a two-step process: first, it rebalances the model's semantic codebook by strategically splitting over-popular tokens while carefully preserving their hierarchical semantic structure. Second, it introduces a tree-structured regularizer during a subsequent training phase. This regularizer enhances semantic consistency, forcing the model to develop more informative and distinct representations for previously neglected, unpopular tokens. The result is a recommendation system that maintains the performance benefits of generative AI while surfacing a much wider diversity of content.

Experiments conducted on real-world datasets demonstrate that CRAB successfully mitigates popularity bias, leading to significant improvements in overall recommendation quality and fairness. The method provides a practical, model-agnostic tool for developers and platforms to retrofit existing systems, moving beyond simply recommending what's already popular to helping users discover novel and niche items that better match their true interests.

Key Points
  • Targets Generative Recommendation (GeneRec) systems, which use semantic tokens but amplify popularity bias.
  • Uses a two-step post-hoc process: rebalancing codebooks by splitting popular tokens and applying a tree-structured regularizer.
  • Proven on real datasets to improve recommendation fairness and performance by surfacing long-tail items.

Why It Matters

Enables fairer AI discovery, helping users find niche content and breaking filter bubbles in platforms like Spotify and Netflix.