Developer Tools

Building an Internal Coding Agent at Zup: Lessons and Open Questions

A new study reveals targeted tool design and safety guardrails improved agent reliability more than prompt engineering.

Deep Dive

A team at Zup, led by researchers Gustavo Pinto, Pedro Eduardo de Paula Naves, Ana Paula Camargo, and Marselle Silva, has published a paper detailing their experience building an internal AI coding assistant named CodeGen. The study, "Building an Internal Coding Agent at Zup: Lessons and Open Questions," tackles the common enterprise problem where prototype AI agents fail to transition to reliable production tools. The researchers argue that the gap isn't primarily about the underlying large language model (LLM); instead, it's about the engineering systems built around it.

Their findings show that targeted tool design was a major success factor. For instance, configuring the agent to make precise string-replacement edits rather than attempting full-file rewrites significantly improved reliability and reduced errors. Furthermore, implementing layered safety guardrails—mechanisms to check and constrain the agent's actions—proved more effective for stability than extensive prompt engineering efforts. To foster adoption, the team introduced progressive human oversight modes, allowing developers to choose their level of interaction with the agent, which built organic trust without forcing compliance. The core conclusion is clear: for enterprise coding agents, the practical engineering and system design decisions are more decisive for delivering real value than simply selecting the most powerful model.

Key Points
  • Targeted tool design, like string-replacement edits, improved reliability more than optimizing prompts for the core AI model.
  • Layered safety guardrails were critical for production readiness, acting as a more decisive factor for stability than model choice alone.
  • Progressive oversight modes (e.g., letting users choose interaction levels) drove organic developer adoption by building trust, not mandating it.

Why It Matters

This provides a crucial blueprint for enterprises to move beyond AI prototypes and build reliable, adopted coding tools that developers actually trust.