Developer Tools

From Plan to Action: How Well Do Agents Follow the Plan?

Study of 16,991 agent trajectories shows poor plan compliance undermines software engineering tasks.

Deep Dive

A new research paper from the University of Illinois Urbana-Champaign and IBM, titled "From Plan to Action: How Well Do Agents Follow the Plan?", provides the first systematic analysis of plan compliance in AI agents designed for software engineering. The study examined a massive dataset of 16,991 agent trajectories from the popular SWE-agent framework across four different large language models (LLMs) on the SWE-bench Verified and SWE-bench Pro benchmarks. The core finding is stark: without explicit guidance, agents default to incomplete or overfitted workflows internalized during training, leading to inconsistent and often unsuccessful task execution.

Providing a standard, well-structured plan improved issue resolution, and the researchers discovered that periodic reminders of the plan could mitigate violations and boost success rates. However, the study delivered several counterintuitive results. A subpar or poorly designed plan actually hurts performance more than providing no plan at all. Furthermore, augmenting a plan with extra, task-relevant phases in the early stages can degrade performance, particularly when these additions clash with the model's ingrained problem-solving strategy. These findings expose a critical gap in current AI agent development.

The research highlights that simply encoding task-specific plans into models during training is insufficient. The current paradigm often leads to memorization rather than adaptive reasoning. The authors argue for a shift toward fine-tuning methods that explicitly teach models how to follow and reason with *instructed* plans dynamically. This means developing agents that can understand a strategy, adhere to its phases like navigation, reproduction, patch, and validation for bug fixing, and adjust their actions accordingly, moving beyond rigid, pre-baked workflows. This capability is essential for truly autonomous and reliable AI agents in complex domains like software engineering.

Key Points
  • Analyzed 16,991 agent trajectories from SWE-agent across 4 LLMs, finding frequent plan non-compliance.
  • A bad plan hurts performance more than no plan; periodic reminders of a good plan improve success.
  • Reveals a need for new fine-tuning to teach adaptive plan-following, not workflow memorization.

Why It Matters

For reliable AI assistants in coding and beyond, we need agents that can understand and follow strategic instructions, not just recall training data.