Developer Tools

From Plan to Action: How Well Do Agents Follow the Plan?

arXiv cs.SE April 15, 2026

⚡Study of 16,991 agent trajectories shows poor plan compliance undermines software engineering tasks.

Deep Dive

A new research paper from the University of Illinois Urbana-Champaign and IBM, titled "From Plan to Action: How Well Do Agents Follow the Plan?", provides the first systematic analysis of plan compliance in AI agents designed for software engineering. The study examined a massive dataset of 16,991 agent trajectories from the popular SWE-agent framework across four different large language models (LLMs) on the SWE-bench Verified and SWE-bench Pro benchmarks. The core finding is stark: without explicit guidance, agents default to incomplete or overfitted workflows internalized during training, leading to inconsistent and often unsuccessful task execution.

Providing a standard, well-structured plan improved issue resolution, and the researchers discovered that periodic reminders of the plan could mitigate violations and boost success rates. However, the study delivered several counterintuitive results. A subpar or poorly designed plan actually hurts performance more than providing no plan at all. Furthermore, augmenting a plan with extra, task-relevant phases in the early stages can degrade performance, particularly when these additions clash with the model's ingrained problem-solving strategy. These findings expose a critical gap in current AI agent development.

The research highlights that simply encoding task-specific plans into models during training is insufficient. The current paradigm often leads to memorization rather than adaptive reasoning. The authors argue for a shift toward fine-tuning methods that explicitly teach models how to follow and reason with *instructed* plans dynamically. This means developing agents that can understand a strategy, adhere to its phases like navigation, reproduction, patch, and validation for bug fixing, and adjust their actions accordingly, moving beyond rigid, pre-baked workflows. This capability is essential for truly autonomous and reliable AI agents in complex domains like software engineering.

Key Points

Analyzed 16,991 agent trajectories from SWE-agent across 4 LLMs, finding frequent plan non-compliance.
A bad plan hurts performance more than no plan; periodic reminders of a good plan improve success.
Reveals a need for new fine-tuning to teach adaptive plan-following, not workflow memorization.

Why It Matters

For reliable AI assistants in coding and beyond, we need agents that can understand and follow strategic instructions, not just recall training data.

Read Original Article

From Plan to Action: How Well Do Agents Follow the Plan?

Why It Matters

Stay Ahead in AI