Engram achieves 83.6% accuracy vs. 73.2% for full-context on LongMemEval_S, using 9.6k vs. 79k tokens (8x fewer)?

Engram achieves 83.6% accuracy vs. 73.2% for full-context on LongMemEval_S, using 9.6k vs. 79k tokens (8x fewer).

Bi-temporal knowledge graph stores facts with provenance and supersession chains; updates invalidate without deletion?

Bi-temporal knowledge graph stores facts with provenance and supersession chains; updates invalidate without deletion.

Hybrid read path fuses dense, lexical, graph, recency, and salience signals with a point-in-time filter for lean retrieval?

Hybrid read path fuses dense, lexical, graph, recency, and salience signals with a point-in-time filter for lean retrieval.

Research & Papers

Engram memory engine beats full history with 10% higher accuracy using 8x fewer tokens

arXiv cs.CL June 10, 2026

⚡New bi-temporal memory system outperforms full-context LLM agents while slashing token usage.

Deep Dive

A new paper from Liuyin Wang introduces Engram, an open-source memory engine designed to solve long-term memory loss in LLM agents without resorting to expensive full-history prompts. Engram uses a bi-temporal data model with two processing paths: a fast write path that appends lossless episodes without invoking an LLM, and an asynchronous path that extracts atomic facts (subject, predicate, object) to build a knowledge graph. Crucially, it invalidates outdated facts without deletion, preserving provenance and a supersession chain for every fact.

Engram's hybrid read path fuses dense retrieval, lexical search, graph traversal, and recency/salience signals with a point-in-time filter. On the full 500-question LongMemEval_S benchmark, Engram's lean configuration achieved 83.6% accuracy using only ~9.6k tokens, versus 73.2% for full-context baselines using 79k tokens — a 10.4-point improvement with 8x fewer tokens and zero errors out of 500 queries. The researchers also contribute a reproducible evaluation harness with built-in category-specific judges and full-context baselines, addressing common benchmarking pitfalls like truncation and home-grown judges.

Key Points

Engram achieves 83.6% accuracy vs. 73.2% for full-context on LongMemEval_S, using 9.6k vs. 79k tokens (8x fewer).
Bi-temporal knowledge graph stores facts with provenance and supersession chains; updates invalidate without deletion.
Hybrid read path fuses dense, lexical, graph, recency, and salience signals with a point-in-time filter for lean retrieval.

Why It Matters

Enables scalable, accurate long-term memory for LLM agents without expensive full-history prompts, reducing token costs significantly.

Read Original Article

Engram memory engine beats full history with 10% higher accuracy using 8x fewer tokens

Why It Matters

Related Articles

🚀 Stay Ahead in AI