Media & Culture

Hit 90.4% on LongMemEval-S with structured storage - no embeddings, ~half the tokens, 98% retrieval accuracy

r/Singularity April 26, 2026

⚡98% retrieval accuracy using structured storage, half the tokens of embedding methods

Deep Dive

A solo developer known as MontyOW has achieved a 90.4% score on the LongMemEval-S benchmark with c137, a structured memory system that deliberately avoids embeddings. The system uses a fixed 3-stage pipeline: retrieve, answer, and store. Stages 1 and 3 maintain maps of existing memory (topics, facts, ledgers) while staying lean, and Stage 2 only processes the relevant slice. This approach uses a median of 15k tokens per question (3k cached system, 2k user model, 8k dynamic, 2k tail) with no embeddings anywhere.

The developer, who built this during their first year of university, started with embeddings and centroid clustering but found it felt too much like a search engine. Agentic approaches with tool calling proved unreliable with weaker models. The key insight was that if you store correctly, retrieval becomes a 1-hop problem. Currently, 10 out of 500 questions lacked context to answer, and the remaining failures were due to model misuse of context. The project includes a public bench viewer (c137.ai/research/benchmark) where you can see all 500 questions sorted by category with pass/fail status, ground truth, and failures bucketed into model-fails vs retrieval-fails.

Key Points

90.4% on LongMemEval-S with 98% retrieval accuracy using structured storage
No embeddings used; 3-stage pipeline (retrieve, answer, store) with median 15k tokens per question
Public bench viewer with 500 questions, pass/fail breakdown, and failure categorization

Why It Matters

Demonstrates structured memory can outperform embeddings for long-context AI tasks with half the tokens

Read Original Article

Hit 90.4% on LongMemEval-S with structured storage - no embeddings, ~half the tokens, 98% retrieval accuracy

Why It Matters

Stay Ahead in AI