Orchestrating Human-AI Software Delivery: A Retrospective Longitudinal Field Study of Three Software Modernization Programs
A 3-year study of 400k+ LOC projects shows AI orchestration reduces effort by 87% and bugs by 74%.
A groundbreaking longitudinal study from researchers Maximiliano Armesto and Christophe Kolb reveals the transformative impact of orchestrated AI in enterprise software delivery. Their paper analyzes Chiron, an industrial platform that coordinates humans and AI agents across four delivery stages: analysis, planning, implementation, and validation. The three-year field study examined three real-world software modernization programs: a COBOL banking migration (~30k LOC), a massive accounting system modernization (~400k LOC), and a .NET/Angular mortgage platform update (~30k LOC).
Across five delivery configurations—one traditional baseline and four successive platform versions—the results were dramatic. Under baseline staffing assumptions, portfolio totals dropped from 36.0 to 9.3 project-weeks, representing a 74% reduction in delivery time. Modeled raw effort fell from 1080.0 to 232.5 person-days, while senior-equivalent effort plummeted from 1080.0 to 139.5 SEE-days—an 87% efficiency gain. Validation-stage issue load decreased from 8.03 to 2.09 issues per 100 tasks, and first-release coverage improved from 77.0% to 90.5%.
The most significant improvements came in versions V3 and V4, which introduced acceptance-criteria validation, repository-native review, and hybrid human-agent execution. These features simultaneously improved speed, coverage, and issue load, demonstrating that the largest gains occur when AI is embedded in orchestrated workflows rather than deployed as isolated coding assistants. The study provides the first substantial evidence that team-level AI orchestration can transform enterprise software delivery at scale.
- Chiron platform reduced software delivery time by 74% across three real-world programs totaling over 460k LOC
- Senior-equivalent engineering effort dropped 87% (from 1080 to 139.5 SEE-days) while first-release coverage improved to 90.5%
- Validation-stage issues decreased by 74% (from 8.03 to 2.09 issues per 100 tasks) through acceptance-criteria validation
Why It Matters
This provides the first empirical evidence that AI workflow orchestration, not just coding assistants, delivers massive efficiency gains in enterprise software delivery.