[R] Seeking arXiv Endorsement for cs.AI: Memento - A Fragment-Based Memory System for LLM Agents
New system treats AI memory as atomic fragments, not documents, to prevent context loss between sessions.
A new research paper titled 'Memento: Fragment-Based Asynchronous Memory Externalization for Persistent Context in Large Language Model Agents' proposes a novel solution to a fundamental AI problem: LLM agents suffer from complete amnesia when a session ends. Current methods like RAG (retrieval-augmented generation) and summarization are flawed—RAG introduces irrelevant noise from large document chunks, while summarization loses critical details through compression. Memento, developed by JinHo-von-Choi, fundamentally rethinks memory architecture by treating it as a collection of atomic, typed 'fragments' instead of monolithic documents.
Each memory fragment is just 1-3 sentences and is categorized using a six-type taxonomy: Facts, Decisions, Errors, Preferences, Procedures, and Relations. This structured approach allows for precise retrieval. The system is biologically inspired, applying Ebbinghaus's forgetting curve to implement memory decay rates. Technically, it employs a three-tier hybrid retrieval stack combining Redis for speed, PostgreSQL with GIN indexing for structured queries, and pgvector's HNSW for semantic similarity, using Reciprocal Rank Fusion (RRF) to merge results.
Crucially, Memento's pipeline is asynchronous, handling tasks like embedding generation and contradiction detection without blocking the agent's primary operations, keeping it responsive. It is already deployed in a personal production environment supporting software engineering workflows, where it shows a 'substantial' density improvement over standard chunk-level RAG. While formal benchmarks are pending, the qualitative evaluation suggests a significant leap in maintaining coherent, long-term context for AI agents that can take actions and learn over time.
- Replaces noisy RAG chunks with atomic 'fragments' (1-3 sentences) using a 6-type taxonomy (Facts, Decisions, Errors, etc.).
- Uses a 3-tier hybrid retrieval stack (Redis → PostgreSQL GIN → pgvector HNSW) with biologically-inspired memory decay.
- Asynchronous pipeline prevents blocking, already deployed in production for software engineering workflows with substantial qualitative improvements.
Why It Matters
Enables LLM agents to learn and remember across sessions, moving them from single-use tools to persistent, evolving assistants.