Research & Papers

ICML 2026 paper reveals tool-use collapse in visual chain-of-thought agents

AI agents stop using tools yet keep getting smarter — until you force them.

Deep Dive

Visual chain-of-thought agents rely on external tools (e.g., image captioning, object detectors) to gather fine-grained evidence, but their role in complex reasoning has been underexplored. In a new ICML 2026 paper, researchers Kim, Tan, and Kim studied these agents on challenging tasks like 3D spatial reasoning and medical visual question answering, where agents must integrate local tool evidence with global context. They identified a surprising ‘tool-use collapse’: as training progresses, models gradually stop using tools while still achieving higher task accuracy. This asymmetry was striking—completely removing tools degraded performance, but incentivizing tool use led to only marginal gains despite doubling usage rates.

The root cause, the team found, is that both vanilla training and tool-use encouragement reduce the diversity of agent rollouts (the sequences of reasoning steps and tool invocations). To counter this, they added an entropy regularization term to the training objective, which encourages broader exploration over language generation and tool choices. This approach boosted reasoning accuracy even as tool usage naturally declined, outperforming methods that forced higher tool frequency. The results held across both 3D and medical domains, suggesting that the principle of diversity over frequency generalizes. The study reframes external tools as scaffolding for training-time exploration rather than crutches for inference.

Key Points
  • Tool-use collapse: agents reduce tool usage during training while accuracy continues to rise
  • Asymmetric effect: removing tools hurts performance, but incentivizing tool use yields only marginal gains
  • Entropy regularization boosts reasoning diversity, achieving top accuracy with declining tool frequency

Why It Matters

This research flips the script on tool use in AI agents: less can be more if you focus on exploration diversity.