MemQ improves LLM agent memory by linking past retrievals in DAGs
Boosts multi-step task success by up to 5.7 percentage points via structural credit propagation.
Episodic memory in LLM agents typically treats each memory as an isolated unit, missing how one retrieval enables the creation of later memories. MemQ solves this by recording which memories were retrieved when a new memory was created, forming a provenance DAG (directed acyclic graph). It then applies TD(λ) eligibility traces to propagate credit backward through this graph, using a decay factor (\(\gamma\lambda)^d\) where \(d\) is DAG depth—replacing temporal distance with structural proximity. The authors formalize the setting as an Exogenous-Context MDP, decoupling the external task stream from the internal memory store.
MemQ was evaluated on six diverse benchmarks: OS interaction, function calling, code generation, multimodal reasoning, embodied reasoning, and expert-level QA. It achieved the highest success rate on all six in both generalization evaluation and runtime learning. Gains were most pronounced on multi-step tasks with deep provenance chains (up to +5.7 percentage points), and smallest on single-step classification (+0.77 pp). The paper also provides guidance on parameter selection for γ and λ, and the code will be released soon.
- Applies TD(λ) eligibility traces to propagate memory credit backward through a provenance DAG, using structural depth instead of time.
- Outperforms baselines on all six benchmarks: OS, function calling, code, multimodal, embodied, and QA—largest gain +5.7 pp on multi-step tasks.
- Formalizes the problem as an Exogenous-Context MDP, separating task stream from memory store.
Why It Matters
MemQ gives LLM agents a smarter memory that learns from dependency chains, boosting reliability on complex tasks.