Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning
New reinforcement learning framework outperforms long-context LLMs and RAG on 13 dynamic reasoning benchmarks.
A research team led by Kehao Zhang has introduced a novel approach to solving one of AI's persistent challenges: long-context reasoning. Their paper 'Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning' presents the Unified Memory Agent (UMA), a reinforcement learning framework that fundamentally changes how AI systems handle extended information streams.
Unlike current long-context LLMs and Retrieval-Augmented Generation (RAG) systems that process information passively, UMA actively manages memory through a unified policy. The system maintains a dual memory representation: a compact core summary for global context and a structured Memory Bank that supports explicit CRUD (create, read, update, delete) operations over key-value entries. This enables proactive information consolidation during data streaming, allowing the system to reorganize and update memories as new information arrives.
To properly evaluate long-horizon memory behavior, the researchers created Ledger-QA, a diagnostic benchmark where answers require tracking accumulated updates rather than simple retrieval. Across 13 datasets spanning Ledger-QA, test-time learning, and accurate retrieval tasks, UMA substantially outperformed both long-context and RAG baselines on dynamic reasoning and learning tasks while remaining competitive on standard retrieval benchmarks. The framework demonstrates that learned, end-to-end memory management can overcome the brittleness of current approaches when dealing with ultra-long streams containing frequent updates and contradictions.
- UMA uses reinforcement learning to unify memory operations and QA in a single policy, enabling proactive information management.
- The system's dual memory includes a core summary and structured Memory Bank with explicit CRUD operations for dynamic updates.
- Outperformed long-context LLMs and RAG on 13 datasets, especially on the new Ledger-QA benchmark for continuous state tracking.
Why It Matters
Enables AI systems to handle real-time data streams and complex, evolving scenarios where passive retrieval fails.