Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning
InfoTree lifts mixed-outcome ratio from 58% to 76% with under 5% budget overhead…
Deep Dive
InfoTree is a training-time tree-search framework for tool-use agentic reinforcement learning. It formalizes rollout informativeness via submodular optimization, using a UUCB selector and Adaptive Budget Allocator. InfoTree lifts mixed-outcome ratios from 58.1% to 76.3% with <5% overhead, and cuts wall-clock overhead from 14.3% to 4.8%. It outperforms six baselines across nine math, web-search, and coding benchmarks.
Key Points
- InfoTree formalizes rollout informativeness as a submodular optimization problem, providing a 1-1/e approximation guarantee for greedy state selection.
- The Adaptive Budget Allocator lifts mixed-outcome ratio from 58.1% to 76.3% with under 5% budget overhead; Speculative Expansion cuts wall-clock overhead from 14.3% to 4.8%.
- Outperforms six baselines across nine benchmarks spanning math reasoning, web-search agents, and tool-rich coding/OS tasks.
Why It Matters
InfoTree offers a principled, budget-efficient way to train tool-using AI agents, directly improving performance on reasoning and search tasks.