Research & Papers

Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning

InfoTree lifts mixed-outcome ratio from 58% to 76% with under 5% budget overhead…

Deep Dive

InfoTree is a training-time tree-search framework for tool-use agentic reinforcement learning. It formalizes rollout informativeness via submodular optimization, using a UUCB selector and Adaptive Budget Allocator. InfoTree lifts mixed-outcome ratios from 58.1% to 76.3% with <5% overhead, and cuts wall-clock overhead from 14.3% to 4.8%. It outperforms six baselines across nine math, web-search, and coding benchmarks.

Key Points
  • InfoTree formalizes rollout informativeness as a submodular optimization problem, providing a 1-1/e approximation guarantee for greedy state selection.
  • The Adaptive Budget Allocator lifts mixed-outcome ratio from 58.1% to 76.3% with under 5% budget overhead; Speculative Expansion cuts wall-clock overhead from 14.3% to 4.8%.
  • Outperforms six baselines across nine benchmarks spanning math reasoning, web-search agents, and tool-rich coding/OS tasks.

Why It Matters

InfoTree offers a principled, budget-efficient way to train tool-using AI agents, directly improving performance on reasoning and search tasks.