Did You Check the Right Pocket? Cost-Sensitive Store Routing for Memory-Augmented Agents
New research shows AI agents can be 40% more efficient by only checking relevant memory stores, not all of them.
A new research paper by Madhava Gaikwad, accepted at the ICLR 2026 Workshop, tackles a critical inefficiency in modern AI agents. Titled 'Did You Check the Right Pocket? Cost-Sensitive Store Routing for Memory-Augmented Agents,' it identifies that most systems today retrieve information from all available memory stores for every query. This brute-force approach increases computational cost (token usage) and pollutes the agent's context with irrelevant information, potentially harming accuracy.
The paper formulates memory retrieval as a store-routing problem, evaluating it with coverage, exact match, and token efficiency metrics. The key finding is that an 'oracle router'—a hypothetical perfect selector—achieves higher answer accuracy while using substantially fewer context tokens compared to uniform retrieval from all stores. This demonstrates that intelligent, selective retrieval is superior, making routing decisions a 'first-class component' of agent design.
Furthermore, Gaikwad formalizes store selection as a cost-sensitive decision problem, creating a framework that explicitly trades answer accuracy against retrieval cost. This provides a principled way to design and interpret routing policies, moving beyond ad-hoc heuristics. The work strongly motivates the development of learned routing mechanisms, which could be built using smaller classifier models, to make scalable multi-store agent systems both more performant and economically viable.
- Proves uniform retrieval from all memory stores is inefficient, increasing cost and adding noise.
- An oracle router model showed higher accuracy with substantially fewer tokens, proving selective retrieval's value.
- Formalizes store selection as a cost-sensitive optimization, trading accuracy vs. cost for principled policy design.
Why It Matters
This research is a blueprint for building cheaper, faster, and more accurate AI agents that can reason over large, specialized knowledge bases.