Research & Papers

AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition

arXiv cs.AI February 13, 2026

⚡Your AI agents are secretly fragile. A new study reveals why they crash outside the lab.

Deep Dive

A new benchmark called AgentNoiseBench reveals a critical flaw in today's LLM-based agents: they perform well in ideal lab conditions but fail in real-world noisy environments. The study systematically injects 'user-noise' and 'tool-noise' into existing benchmarks to test agent robustness. Extensive evaluations across diverse models show consistent performance drops, exposing how current training overlooks real-world stochasticity and highlighting a major gap between benchmark scores and practical deployment success.

Why It Matters

This exposes a fundamental weakness in current AI agents, meaning real-world applications are far less reliable than their impressive benchmark scores suggest.

Read Original Article

AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition

Why It Matters

Stay Ahead in AI