Tenure: structured belief state beats similarity search for LLM memory
Cosine similarity scored 0.12 precision; BM25 with aliasing hit 1.0 — a 8x gap.
A new arXiv paper by Jeffrey Flynt challenges the dominant retrieval-augmented generation (RAG) paradigm for cross-session LLM memory, arguing that similarity search fundamentally fails when the user and query share a bounded vocabulary. Instead of embedding conversation history and retrieving by semantic similarity, the paper introduces Tenure — a local-first proxy that maintains a structured belief store with typed facts, epistemic status, versioned supersession, and hard scope isolation. The key insight: for a single user or a small engineering team, beliefs about a shared domain are semantically proximate by construction, so vector similarity cannot disambiguate them reliably.
Tenure's controlled evaluation on 72 retrieval cases starkly illustrates the gap. Cosine similarity over dense embeddings achieved mean precision of only 0.12, passing just 8 of 72 cases. In contrast, an alias-weighted BM25 approach achieved perfect precision of 1.0, passing all 72 cases. The paper also shows that under multi-turn topic drift, the vector backend produces drift scores of 0.43–0.50 on noise-critical turns while BM25 maintains near-zero drift. Tenure's structured approach — including a "why it matters" field that converts extracted facts into imperative instructions — makes injected beliefs directly actionable rather than raw material for the model to re-derive.
- Tenure is a local-first proxy that manages LLM memory as structured belief states with epistemic status, versioning, and scope isolation.
- Cosine similarity achieved mean precision 0.12 (8/72 cases); alias-weighted BM25 achieved 1.0 precision (72/72) on the same 72-case test set.
- Under multi-turn topic drift, vector backend produced drift scores of 0.43–0.50 vs. BM25 maintaining near-zero drift on noise-critical turns.
Why It Matters
Structured belief stores could replace RAG for personal/team LLM memory, eliminating re-orientation costs in session-heavy workflows.