Research & Papers

AI Planning Framework for LLM-Based Web Agents

arXiv cs.AI March 16, 2026

⚡A new taxonomy links modern LLM agents to search algorithms like BFS and DFS, enabling better failure diagnosis.

Deep Dive

Researchers Orit Shahnovsky and Rotem Dror have published a paper introducing a formal AI planning framework to demystify the operation of LLM-based web agents. The core of their work is a novel taxonomy that maps modern agent architectures to traditional planning paradigms: Step-by-Step agents correspond to Breadth-First Search (BFS), Tree Search agents to Best-First Tree Search, and Full-Plan-in-Advance agents to Depth-First Search (DFS). This mapping provides a principled way to diagnose common system failures, such as context drift and incoherent task decomposition, which are often opaque in black-box LLM agents.

To validate their framework, the researchers created a new dataset of 794 human-labeled task trajectories from the WebArena benchmark and proposed five novel evaluation metrics that assess trajectory quality beyond basic success rates. In a comparative test, they found a baseline Step-by-Step agent achieved a 38% overall success rate and aligned more closely with human reasoning, while a novel Full-Plan-in-Advance agent excelled in technical precision with 89% element accuracy. This demonstrates that different architectures have distinct strengths, and the choice of agent should be driven by specific application needs, not a one-size-fits-all approach.

Key Points

Maps three LLM agent types (Step-by-Step, Tree Search, Full-Plan) to classic search algorithms like BFS and DFS for clearer diagnosis.
Introduces five new evaluation metrics and a dataset of 794 human-labeled web task trajectories to assess agent performance beyond success rates.
Test shows Step-by-Step agents match human logic 38% of the time, while Full-Plan agents achieve 89% element accuracy, highlighting a trade-off.

Why It Matters

Provides a systematic way for developers to debug and select the optimal AI agent architecture for complex, real-world web automation tasks.

Read Original Article

AI Planning Framework for LLM-Based Web Agents

Why It Matters

Stay Ahead in AI