Research & Papers

SelRoute: Query-Type-Aware Routing for Long-Term Conversational Memory Retrieval

New framework routes queries to specialized pipelines, achieving 0.800 Recall@5 with 109M-parameter models.

Deep Dive

Researcher Matthew McKee has introduced SelRoute, a novel framework designed to improve the retrieval of past interactions from long-term conversational memory. Instead of relying on a single, large retrieval model, SelRoute intelligently classifies each incoming query and routes it to one of four specialized retrieval pipelines: lexical, semantic, hybrid, or vocabulary-enriched. This query-type-aware approach allows the system to apply the most effective search strategy for each specific question type, optimizing both accuracy and efficiency.

On the LongMemEval_M benchmark, SelRoute achieved a Recall@5 score of 0.800 using the bge-base-en-v1.5 model (109M parameters), outperforming the previous best baseline of 0.762. Remarkably, a simple zero-ML baseline using just SQLite's FTS5 search already exceeded all published baselines on ranking quality, highlighting a potential gap in prior lexical retrieval implementations. The system's routing logic, powered by a regex-based classifier with 83% accuracy, proved stable across different data splits.

The framework's strength lies in its generalization and efficiency. It was validated across eight additional benchmarks—including MSDialog, LoCoMo, and QReCC—spanning over 62,000 instances, without any benchmark-specific tuning. A key finding was an 'enrichment-embedding asymmetry,' where vocabulary expansion helps lexical search but can hurt semantic embedding search, emphasizing the need for per-pipeline decisions. Critically, SelRoute requires no GPU and performs no LLM inference at query time, making it a highly practical and resource-efficient solution for real-time conversational AI applications.

Key Points
  • Achieves 0.800 Recall@5 on LongMemEval_M using a 109M-parameter model, beating prior 0.762 baseline.
  • Routes queries to 4 specialized pipelines (lexical/semantic/hybrid/enriched) based on an 83%-accurate classifier.
  • Validated on 8+ benchmarks (62k+ instances) with no GPU or LLM inference needed at query time.

Why It Matters

Enables more efficient, accurate long-term memory for AI assistants without expensive GPU compute, lowering operational costs.