Research & Papers

GEM: A Native Graph-based Index for Multi-Vector Retrieval

New native graph framework achieves up to 16x faster search while maintaining or improving accuracy.

Deep Dive

A research team led by Yao Tian and Zhoujin Tian has introduced GEM, a breakthrough native indexing framework designed specifically for multi-vector retrieval. Unlike previous methods that tried to adapt single-vector indexes, GEM builds a proximity graph directly over sets of vectors, preserving their fine-grained semantics. The core innovation involves a set-level clustering scheme that associates each vector set with only its most informative clusters, reducing redundancy without sacrificing semantic coverage. GEM then constructs local proximity graphs within these clusters and bridges them into a globally navigable structure, using semantic shortcuts to guide efficient navigation toward relevant regions.

To handle the non-metric nature of multi-vector similarity, GEM decouples the graph construction metric from the final relevance score. At query time, the system launches beam search from multiple entry points and prunes paths early using cluster cues, while a quantized distance estimation technique boosts both indexing and search efficiency. The paper, accepted by SIGMOD 2026, demonstrates that GEM achieves up to 16x speedup over state-of-the-art methods while matching or improving accuracy across in-domain, out-of-domain, and multi-modal benchmarks. This represents a significant advancement for practical applications of multi-vector representations in AI systems.

Key Points
  • Native graph framework achieves up to 16x speedup over state-of-the-art methods
  • Uses set-level clustering and semantic shortcuts to preserve multi-vector semantics
  • Accepted by SIGMOD 2026 and validated across multiple benchmark domains

Why It Matters

Enables faster, more accurate semantic search for RAG systems and multi-modal AI applications, removing a key bottleneck in practical deployment.