Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets
New research finds multi-agent AI's reported gains often come from using more compute, not better architecture.
A new arXiv paper from researchers Dat Tran and Douwe Kiela challenges the prevailing narrative that multi-agent LLM systems (MAS) are inherently superior for complex reasoning. The study presents an information-theoretic argument, grounded in the Data Processing Inequality, which suggests that under a fixed budget of reasoning tokens and with perfect context utilization, a single-agent system (SAS) is more information-efficient. This theory predicts MAS only become competitive when a single agent's context window is degraded or when more computational resources are expended.
To test this, the researchers conducted a controlled empirical study across three major model families: Qwen3, DeepSeek-R1-Distill-Llama, and Gemini 2.5. They compared single-agent setups against various multi-agent architectures, strictly matching the total number of 'thinking' tokens each system could use. The results were clear: SAS consistently matched or outperformed MAS on multi-hop reasoning tasks when compute was normalized. The paper also includes a detailed diagnostic analysis, identifying significant artifacts in API-based budget control (notably in Gemini 2.5) and in standard evaluation benchmarks that can artificially inflate the apparent gains from using multiple agents.
Overall, the findings suggest that many reported advantages of multi-agent systems for tasks like reasoning are better explained by unaccounted-for increases in test-time computation and context effects, rather than any fundamental architectural benefit. The research underscores the critical importance of explicitly controlling and understanding the trade-offs between compute, context, and coordination when designing and evaluating agentic AI systems.
- Single-agent LLMs matched or beat multi-agent systems on reasoning when token budgets were equal, tested on Qwen3, DeepSeek, and Gemini 2.5.
- The study identifies artifacts in API budget control and benchmarks that can inflate perceived multi-agent gains.
- The core finding suggests many multi-agent advantages come from using more compute, not better system design.
Why It Matters
For AI builders, this means optimizing a single, powerful agent may be more cost-effective than orchestrating multiple simpler ones for complex tasks.