Developer Tools

Tool-schema compression restores agentic RAG under tight context budgets

44–50% token savings let LLMs use 800+ tools in 8K context windows

Deep Dive

Agentic RAG systems that define dozens to hundreds of tools face a critical bottleneck: the tool schemas themselves consume the precious context window needed for retrieval-augmented generation. In a new paper, Furkan Sakizli presents the first systematic study of this trade-off, evaluating 14 models (1.5B to 32B local plus one frontier API model) across 6,566 controlled API calls with 28 tool definitions at context budgets of 8K, 16K, and 32K tokens. The proposed compression technique, TSCG (Tool-Schema Compression with Greedy profiles), reduces JSON schemas by 44–50% while preserving their structure.

At the tightest 8K budget, uncompressed schemas completely overflowed the context window, causing near-zero exact-match accuracy (2.6% average). Compressed schemas restored functionality with a +20.5 percentage point lift across all eight models tested; among the six models that fully enabled RAG, the lift reached +24.7 pp. At 32K tokens—where both formats fit—the difference between compressed and uncompressed was negligible (≤1 pp for four of five models), confirming the effect is purely budget-driven. External validation on HotpotQA (50 multi-hop questions) showed a +48 pp exact-match gain under the same overflow scenario. In frontier scaling tests, JSON schemas overflowed at about 494 tools, while compressed schemas remained operational beyond 800 tools. The findings establish tool-schema compression as a necessary infrastructure layer for deploying agentic RAG on constrained-context devices or APIs.

Key Points
  • TSCG compression saves 44–50% of tool schema tokens, tested on 14 models and 6,566 API calls
  • At 8K context, compressed schemas boost exact-match accuracy by +20.5 pp (from 2.6% baseline)
  • Scales to 800+ tools vs. 494 for uncompressed; validated on HotpotQA with +48 pp EM lift

Why It Matters

Enables LLMs to wield hundreds of tools in tiny context windows—critical for edge deployment and low-cost inference.