Engram memory engine beats full history with 10% higher accuracy using 8x fewer tokens
New bi-temporal memory system outperforms full-context LLM agents while slashing token usage.
A new paper from Liuyin Wang introduces Engram, an open-source memory engine designed to solve long-term memory loss in LLM agents without resorting to expensive full-history prompts. Engram uses a bi-temporal data model with two processing paths: a fast write path that appends lossless episodes without invoking an LLM, and an asynchronous path that extracts atomic facts (subject, predicate, object) to build a knowledge graph. Crucially, it invalidates outdated facts without deletion, preserving provenance and a supersession chain for every fact.
Engram's hybrid read path fuses dense retrieval, lexical search, graph traversal, and recency/salience signals with a point-in-time filter. On the full 500-question LongMemEval_S benchmark, Engram's lean configuration achieved 83.6% accuracy using only ~9.6k tokens, versus 73.2% for full-context baselines using 79k tokens — a 10.4-point improvement with 8x fewer tokens and zero errors out of 500 queries. The researchers also contribute a reproducible evaluation harness with built-in category-specific judges and full-context baselines, addressing common benchmarking pitfalls like truncation and home-grown judges.
- Engram achieves 83.6% accuracy vs. 73.2% for full-context on LongMemEval_S, using 9.6k vs. 79k tokens (8x fewer).
- Bi-temporal knowledge graph stores facts with provenance and supersession chains; updates invalidate without deletion.
- Hybrid read path fuses dense, lexical, graph, recency, and salience signals with a point-in-time filter for lean retrieval.
Why It Matters
Enables scalable, accurate long-term memory for LLM agents without expensive full-history prompts, reducing token costs significantly.