Agent Frameworks

MetaForge: Self-evolving AI agent forges tools on demand, beats 16 baselines

A new framework that dynamically creates and recycles tools, achieving top accuracy across 12 benchmarks.

Deep Dive

Multimodal agents have made strides in complex reasoning via tool use, but they remain limited by static predefined tool inventories that fail in unseen scenarios and by indiscriminate tool calls that waste resources and introduce noise. MetaForge, proposed by researchers from multiple institutions, addresses both issues by learning when to invoke tools and how to dynamically evolve its toolset on demand. The framework decomposes agentic behavior into four coupled stages: Decide (judging tool necessity), Retrieve (selecting suitable tools), Adapt (grounding parameters in context), and Forge (synthesizing new skills online and recycling them into the library). This forms a continuous loop that enables the agent to either answer directly, reuse an existing tool, or forge a new one, guided by a unified orchestration policy.

The entire pipeline—invocation necessity, retrieval accuracy, execution effectiveness, and reusability of forged skills—is jointly optimized via reinforcement learning. Crucially, an explicit invocation-cost penalty discourages redundant tool calls, ensuring efficiency. Tested across 12 diverse benchmarks, MetaForge consistently outperforms 16 baseline methods in accuracy, efficiency, and generalization. The results validate a paradigm shift from static tool inventories to on-demand self-evolution, allowing agents to adapt to new tasks without manual intervention. This work has significant implications for autonomous systems that need to operate in dynamic, real-world environments where tool requirements are unpredictable.

Key Points
  • MetaForge uses a four-stage judge-retrieve-adapt-forge-recycle loop to dynamically create and recycle tools
  • A unified orchestration policy decides between direct answering, tool reuse, or forging new tools, optimized via RL with a cost penalty
  • Outperforms 16 baseline methods across 12 benchmarks in accuracy, efficiency, and generalization

Why It Matters

Enables AI agents to autonomously expand their capabilities without manual tool updates, a leap toward truly adaptive autonomous systems.