Can Coding Agents Be General Agents?
New research finds AI coding agents succeed on simple tasks but hit a wall with complex business logic.
A new research paper from authors Maksim Ivanov, Abhijay Rana, and Gokul Prabhakaran investigates a critical question in AI automation: whether coding agents, like those built on models such as GPT-4 or Claude 3, can evolve into general-purpose agents capable of end-to-end business process automation. Published on arXiv, the study moves beyond standard software engineering benchmarks to conduct a practical case study within a real-world open-core Enterprise Resource Planning (ERP) system. This environment tests an agent's ability to understand complex business domain logic—like inventory management or order processing—and translate it into functional code.
The findings reveal a significant capability gap. While the coding agent demonstrated reliable performance on simple, well-defined tasks, it consistently failed on more complex, multi-step business processes. The researchers identify the core bottleneck as the agent's inability to effectively bridge high-level domain-specific requirements with low-level code execution. This suggests that current coding agents lack the deep contextual understanding and reasoning required for true generalization beyond their training on code syntax and common programming patterns. The study calls for new evaluation frameworks and architectural approaches to overcome this hurdle.
- Study evaluated a coding agent on real-world ERP tasks, finding it fails on complex business logic.
- Identified core bottleneck is the agent's inability to bridge domain-specific requirements with executable code.
- Highlights a critical gap preventing current coding-focused AI from becoming true general-purpose automation agents.
Why It Matters
This research tempers hype around AI agents, showing a hard limit for current tech in automating complex business workflows.