Agent Frameworks

MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents

Training-free memory orchestration nearly doubles accuracy on long-horizon benchmarks

Deep Dive

Modern language agents need to handle long, multi-turn conversations, but small language models (SLMs) struggle with context overflow, noisy retrieval, and unreliable reasoning loops. A team of researchers from NJIT (Jiayi Chen, Yingcong Li, Guiling Wang) argues that SLM memory failures are often caused by mismatched memory operations—different query types need different retrieval strategies, evidence transformations, and context budgets that SLMs cannot self-orchestrate.

Enter MemFlow, a training-free memory orchestration framework that avoids open-ended reasoning by externalizing memory planning. At its core, a Router Agent classifies each query by intent and dispatches it to the Memory Agent, which executes one of three specialized tiers: Profile Lookup (for quick fact checks), Targeted Retrieval (for specific evidence), or Deep Reasoning (for complex multi-step questions). A dynamic, tier-aware token budget compiles the evidence, then an Answer Agent generates a response. A Validator Agent optionally retries with a heavier tier if the answer lacks support.

Evaluated on a frozen Qwen3-1.7B backbone across three long-horizon memory benchmarks—LongMemEval, LoCoMo, and LongBench—MemFlow achieved nearly 2x accuracy improvement over standard full-context SLM baselines. The route-then-compile design eliminates tool-selection hallucination and keeps the context compact, making limited-capacity models far more effective in resource-constrained environments like edge devices or real-time assistants.

Key Points
  • MemFlow uses a Router Agent to classify query intent and dispatch to one of three specialized memory tiers (Profile Lookup, Targeted Retrieval, Deep Reasoning).
  • Training-free framework runs on a frozen Qwen3-1.7B backbone and improves accuracy by nearly 2x on LongMemEval, LoCoMo, and LongBench.
  • A Validator Agent enables retries with heavier memory tiers if the generated response lacks evidence support.

Why It Matters

Makes small language models viable for long-horizon agents in edge devices and cost-sensitive deployments.