Research & Papers

Association Is Not Similarity: Learning Corpus-Specific Associations for Multi-Hop Retrieval

A 4.2M-parameter MLP trains in 2 minutes and adds 3.7ms per query.

Deep Dive

Researchers have introduced Association-Augmented Retrieval (AAR), a lightweight transductive reranking method that addresses a key limitation of dense retrieval systems: their inability to capture associative relationships needed for multi-hop reasoning. Traditional dense retrieval ranks passages by embedding similarity to a query, but multi-hop questions require passages that are associatively linked through shared reasoning chains. AAR trains a small MLP with 4.2 million parameters to learn these associative relationships using contrastive learning on co-occurrence annotations.

On the HotpotQA benchmark, AAR improves passage Recall@5 from 0.831 to 0.916, an 8.6-point gain, with the most dramatic improvements on hard questions where the dense baseline fails (+28.5 points). On MuSiQue, AAR achieves a 10.1-point improvement in the transductive setting. The method adds only 3.7ms per query and trains in under two minutes on a single GPU, requiring no LLM-based indexing. However, the inductive model trained on training-split associations showed no significant improvement on unseen validation associations, indicating the method captures corpus-specific co-occurrences rather than transferable patterns. Ablation studies confirmed this: training on semantically similar but non-associated passage pairs degraded retrieval below the baseline, while shuffling association pairs caused severe degradation. A downstream QA evaluation showed retrieval gains translated to a 6.4-point exact match improvement.

Key Points
  • AAR improves HotpotQA Recall@5 from 0.831 to 0.916 (+8.6 points), with +28.5 points on hard questions
  • Method trains a 4.2M-parameter MLP in under 2 minutes on a single GPU, adding only 3.7ms per query
  • Inductive model shows no transferable gains, confirming AAR captures corpus-specific co-occurrence patterns

Why It Matters

AAR offers a practical, low-cost way to boost multi-hop retrieval accuracy without LLM overhead.