MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought
New memory-driven chain-of-thought system transforms long-context reasoning into stateful information search.
A research team led by Haodong Lei has introduced MemCoT, a novel framework designed to enhance how large language models (LLMs) reason over massive, fragmented information. The system fundamentally redefines the reasoning process by transforming it from a static retrieval task into an iterative, stateful information search. This approach directly tackles two critical LLM weaknesses: severe hallucinations and catastrophic forgetting, which occur when models lose track of information across long contexts.
MemCoT's architecture features two core components. First, a multi-view long-term memory perception module enables what the researchers call 'Zoom-In' evidence localization and 'Zoom-Out' contextual expansion. This allows the model to first pinpoint where relevant evidence resides and then reconstruct the surrounding causal structure needed for accurate reasoning. Second, a task-conditioned dual short-term memory system—composed of semantic state memory and episodic trajectory memory—records historical search decisions. This short-term memory dynamically guides query decomposition and pruning across reasoning iterations, making the search process more efficient and context-aware.
The empirical results are compelling. When empowered by the MemCoT framework, several open- and closed-source LLMs achieved state-of-the-art (SOTA) performance on established benchmarks like LoCoMo and LongMemEval-S. These benchmarks specifically test a model's ability to maintain coherent reasoning across long and complex narratives. The 14-page paper, submitted to ACMMM26, presents the framework as a significant step beyond traditional memory mechanisms, which often treat retrieval as a passive, single-step matching process leading to semantic dilution.
- Uses multi-view long-term memory for 'Zoom-In/Out' evidence localization and contextual expansion
- Employs a dual short-term memory system (semantic + episodic) to guide iterative query decomposition
- Achieved SOTA performance on LoCoMo and LongMemEval-S benchmarks, reducing hallucinations in long contexts
Why It Matters
Enables more reliable AI reasoning over documents and conversations, critical for legal, research, and customer support applications.