Research & Papers

Geodesic Semantic Search: Learning Local Riemannian Metrics for Citation Graph Retrieval

New AI system learns local Riemannian metrics on citation graphs, achieving 23% better recall than standard FAISS search.

Deep Dive

A team of researchers including Brandon Yee, Lucas Wang, Kundana Kommini, and Krishna Sharma has published a groundbreaking paper on arXiv titled 'Geodesic Semantic Search: Learning Local Riemannian Metrics for Citation Graph Retrieval.' The work introduces Geodesic Semantic Search (GSS), a novel AI-powered retrieval system designed to transform how we find relevant academic literature. Unlike standard embedding-based methods like FAISS that rely on a single, fixed Euclidean distance metric, GSS learns a unique, low-rank metric tensor at each node (paper) in a citation graph. This creates a local, positive semi-definite metric (G_i = L_i L_i^T + εI), inducing a Riemannian geometry over the entire network. Retrieval is then performed by calculating the shortest geodesic paths using a multi-source Dijkstra algorithm, followed by Maximal Marginal Relevance reranking and path coherence filtering. This approach allows the system to understand the nuanced, contextual relationships between papers that simple vector similarity misses.

The technical innovation shows significant practical gains. Evaluated on a large-scale citation prediction benchmark containing 169,000 papers, GSS achieved a 23% relative improvement in Recall@20 compared to strong baselines like SPECTER embeddings with FAISS search. Furthermore, the team implemented a hierarchical coarse-to-fine search strategy using k-means pooling, which slashes computational cost by a factor of 4 compared to a flat search, while preserving 97% of the retrieval quality. Beyond performance, GSS provides interpretable citation paths, explaining *why* a paper is relevant by showing the connecting chain of references. The authors provide theoretical analysis on when geodesic distances outperform direct similarity and have released their code and models. This work represents a major step beyond static vector databases towards dynamic, geometry-aware retrieval systems that could redefine semantic search in complex knowledge graphs.

Key Points
  • Achieves 23% relative improvement in Recall@20 over SPECTER+FAISS baselines on a 169K-paper benchmark.
  • Uses learned local Riemannian metrics per node, replacing global Euclidean distance with geometry-aware geodesic search.
  • Hierarchical search with k-means pooling reduces compute cost by 4x while maintaining 97% retrieval quality.

Why It Matters

This moves retrieval beyond simple vector similarity to context-aware, interpretable search in complex knowledge graphs like academic literature.