Research & Papers

Differentiable Geometric Indexing for End-to-End Generative Retrieval

New technique solves two core problems in generative retrieval, boosting performance on large-scale e-commerce platforms.

Deep Dive

A research team of ten authors, led by Xujing Wang, has published a groundbreaking paper on arXiv titled 'Differentiable Geometric Indexing for End-to-End Generative Retrieval.' The work tackles two fundamental flaws in current Generative Retrieval (GR) systems, which aim to unify document indexing and search into a single model. The first is the 'Optimization Blockage,' where traditional discrete indexing creates a non-differentiable barrier, preventing the model from learning an optimal index directly from the retrieval task. The second is the 'Geometric Conflict,' where standard scoring methods cause popular items to dominate results, overshadowing more relevant but less common 'long-tail' items.

To solve these issues, the team introduces the Differentiable Geometric Indexing (DGI) framework. It employs 'Soft Teacher Forcing' using Gumbel-Softmax techniques to create a fully differentiable pathway, aligning the indexing and retrieval processes. Simultaneously, DGI uses 'Isotropic Geometric Optimization,' replacing problematic inner-product calculations with scaled cosine similarity on a unit hypersphere. This change effectively decouples an item's popularity from its semantic relevance, correcting geometric bias.

The results are significant. The paper reports that DGI outperforms competitive sparse, dense, and other generative retrieval baselines in extensive experiments on large-scale industry datasets and an online e-commerce platform. A key finding is DGI's superior robustness in long-tail search scenarios, where finding niche, relevant items is most challenging. This validates the paper's core thesis: harmonizing structural differentiability with geometric isotropy is necessary for advanced, reliable retrieval systems.

Key Points
  • Solves 'Optimization Blockage' via Soft Teacher Forcing and Gumbel-Softmax, creating a fully differentiable index.
  • Fixes 'Geometric Conflict' with Isotropic Geometric Optimization, using scaled cosine similarity to remove popularity bias.
  • Outperforms existing sparse, dense, and generative baselines in large-scale tests, showing superior long-tail robustness.

Why It Matters

Enables more accurate and fair AI search by preventing popular items from dominating results, crucial for e-commerce and enterprise retrieval.