Developer Tools

Tool Forge: 99.2% context reduction for governed agentic toolchains

New research cuts agent tool context by 99.2% while boosting accuracy to 0.901 F1.

Deep Dive

Tool Forge treats each tool as a rich capsule containing intent, capability contracts, implementation code, dependency policy, tests, documentation, runtime validation evidence, lifecycle state, credential bindings, and routing metadata. Instead of dumping the full tool catalog into an LLM's context window — a common and token-expensive practice — Tool Forge introduces a Router that dynamically selects only the relevant tools based on the agent's natural-language request. This session-based routing reduces the tool context payload by 99.2% compared to naive full-catalog schema exposure, according to early benchmarks.

Across 83 benchmark cases, the Router achieved an aggregate micro-F1 of 0.901. In a 25-case end-to-end generation probe on local-tool tasks, Tool Forge generated all 25 tool bundles, reached a micro-F1 of 0.940 against deterministic acceptance checks, and passed 23 out of 25 live sandbox validations. The system is designed for enterprise environments that need governance, sandbox isolation, and credential management — addressing a critical gap as LLM agents increasingly perform operational work like calling APIs, manipulating files, and assembling workflows. The paper also notes remaining challenges in adversarial routing, broader API grounding, and cross-system evaluation.

Key Points
  • Tool Forge Router reduces agent tool context by 99.2% compared to naive full-catalog schema exposure
  • Achieved 0.901 micro-F1 on 83 Router benchmark cases and 0.940 on 25 end-to-end generation probes
  • Each tool is a validated capsule containing intent, contracts, implementation, tests, dependencies, lifecycle state, and credential bindings

Why It Matters

Tool Forge enables safer, more efficient agentic execution in enterprises by slashing token costs and enforcing governance