Developer Tools

Inside the Scaffold: A Source-Code Taxonomy of Coding Agent Architectures

New study dissects the hidden 'scaffold' code in 13 AI coding agents, revealing 5 core loop primitives and up to 37 tools per agent.

Deep Dive

Researcher Benjamin Rombaut has published a detailed architectural analysis titled 'Inside the Scaffold: A Source-Code Taxonomy of Coding Agent Architectures.' The paper moves beyond abstract capability surveys to perform a source-code-level dissection of 13 popular open-source coding agents, examining the actual 'scaffold' code—the control loops, tool definitions, and state management systems—that orchestrates the underlying LLMs like GPT-4 or Claude. This granular approach, grounded in specific file paths and line numbers, reveals that the architectures resist simple classification, with tool counts ranging from 0 to 37 and control strategies spanning fixed pipelines to Monte Carlo Tree Search.

The analysis organizes findings across 12 dimensions in three layers: control architecture, tool/environment interface, and resource management. It identifies five fundamental 'loop primitives'—including ReAct, generate-test-repair, and plan-execute—that serve as composable building blocks. Crucially, 11 of the 13 analyzed agents layer multiple primitives rather than relying on a single control structure. The study shows dimensions converge where external constraints are strong (like execution isolation) but diverge on open design questions like context compaction and multi-model routing.

This taxonomy provides the first reusable, code-grounded framework for comparing systems like GitHub Copilot, Cursor, and open-source agents. For practitioners, it demystifies why agents behave differently and offers a blueprint for designing more effective scaffolds. For researchers, it establishes a concrete foundation for studying agent behavior, moving from observing 'what' agents do to understanding 'why' based on their underlying architecture.

Key Points
  • Analyzed 13 open-source coding agents at the source-code level, examining control loops and tool interfaces.
  • Identified 5 core loop primitives (ReAct, generate-test-repair, etc.); 11 of 13 agents layer multiple primitives.
  • Found massive variation: tool counts range from 0 to 37, and context compaction uses seven distinct strategies.

Why It Matters

Provides engineers a blueprint to build better AI coding tools and helps researchers understand agent behavior based on architecture, not just outputs.