AI Planning Framework for LLM-Based Web Agents
A new taxonomy links modern LLM agents to search algorithms like BFS and DFS, enabling better failure diagnosis.
Researchers Orit Shahnovsky and Rotem Dror have published a paper introducing a formal AI planning framework to demystify the operation of LLM-based web agents. The core of their work is a novel taxonomy that maps modern agent architectures to traditional planning paradigms: Step-by-Step agents correspond to Breadth-First Search (BFS), Tree Search agents to Best-First Tree Search, and Full-Plan-in-Advance agents to Depth-First Search (DFS). This mapping provides a principled way to diagnose common system failures, such as context drift and incoherent task decomposition, which are often opaque in black-box LLM agents.
To validate their framework, the researchers created a new dataset of 794 human-labeled task trajectories from the WebArena benchmark and proposed five novel evaluation metrics that assess trajectory quality beyond basic success rates. In a comparative test, they found a baseline Step-by-Step agent achieved a 38% overall success rate and aligned more closely with human reasoning, while a novel Full-Plan-in-Advance agent excelled in technical precision with 89% element accuracy. This demonstrates that different architectures have distinct strengths, and the choice of agent should be driven by specific application needs, not a one-size-fits-all approach.
- Maps three LLM agent types (Step-by-Step, Tree Search, Full-Plan) to classic search algorithms like BFS and DFS for clearer diagnosis.
- Introduces five new evaluation metrics and a dataset of 794 human-labeled web task trajectories to assess agent performance beyond success rates.
- Test shows Step-by-Step agents match human logic 38% of the time, while Full-Plan agents achieve 89% element accuracy, highlighting a trade-off.
Why It Matters
Provides a systematic way for developers to debug and select the optimal AI agent architecture for complex, real-world web automation tasks.