episodic (vector embeddings) and semantic (structured facts) work together for unbounded context.

MOIM injection mechanism scales retrieval depth based on available token budget, ensuring efficient context use?

MOIM injection mechanism scales retrieval depth based on available token budget, ensuring efficient context use.

Intra-session retrieval allows searching compacted turns from the current session—a first for memory-augmented LLMs?

Intra-session retrieval allows searching compacted turns from the current session—a first for memory-augmented LLMs.

Research & Papers

CALMem gives LLMs unbounded memory without model changes

arXiv cs.IR May 21, 2026

⚡A new app-layer memory system lets AI recall every conversation turn, even after compaction.

Deep Dive

CALMem (Conversational Application-Layer Memory) is a new dual memory architecture designed to solve the fundamental limitation of fixed context windows in large language models (LLMs). Developed by researchers Rajendra Narayan Jena, Rajan Padmanabhan, and Sankar Arumugam, the system operates entirely at the application layer, meaning it requires no modifications to the underlying LLM and works with any provider. The architecture consists of two complementary memory subsystems: an episodic memory layer built on sliding-window vector embeddings that captures conversation history, and a semantic memory layer that stores structured facts written by the AI agent itself. A central component is the MOIM (Message of Injected Memory), a token-budget-adaptive injection mechanism that automatically retrieves and injects relevant past context at each turn, scaling injection depth inversely with the current context pressure.

A standout contribution is CALMem's support for intra-session retrieval—compacted-away turns from the current session remain searchable, a gap that prior work like MemGPT did not address. This means even after the context window is full and older turns are compressed, the system can still recall information from earlier in the same conversation. Implemented in a production Rust codebase, CALMem is provider-agnostic and degrades seamlessly to the original LLM behavior with zero overhead when disabled. The paper details the architecture, design decisions, and performance trade-offs, positioning CALMem as a practical, model-independent solution for building conversational AI systems that maintain coherent, long-running dialogues.

Key Points

Dual memory: episodic (vector embeddings) and semantic (structured facts) work together for unbounded context.
MOIM injection mechanism scales retrieval depth based on available token budget, ensuring efficient context use.
Intra-session retrieval allows searching compacted turns from the current session—a first for memory-augmented LLMs.

Why It Matters

CALMem enables truly persistent, long-running conversations in AI assistants without vendor lock-in or model retraining.

Read Original Article

CALMem gives LLMs unbounded memory without model changes

Why It Matters

Related Articles

🚀 Stay Ahead in AI