Developer Tools

Human-AI Synergy in Agentic Code Review

AI agents' code suggestions are adopted 50% less often than humans' and increase code complexity more.

Deep Dive

A new large-scale study provides the first major empirical evidence on the effectiveness of AI agents in collaborative code review workflows. Researchers from Queen's University and others analyzed nearly 280,000 review conversations, finding that while AI agents are integrated into the process, their suggestions are adopted into the codebase at a significantly lower rate than those from human reviewers. Over half of the unadopted AI suggestions were either factually incorrect or addressed through alternative fixes by developers, highlighting a quality gap.

When AI suggestions are adopted, they produce a larger increase in code complexity and code size compared to human-proposed changes. The research also revealed distinct collaboration patterns: human reviewers provide 11.8% more feedback rounds when reviewing AI-generated code and contribute additional context like testing considerations and knowledge transfer that AI currently lacks. This suggests AI agents function best as a scalable first-pass screening tool for defects, not as a replacement for human judgment.

The findings have immediate implications for engineering teams adopting AI-powered review tools from GitHub, GitLab, and others. The study underscores that human oversight is not just a nice-to-have but a necessity for ensuring code quality, maintaining system simplicity, and providing the nuanced, contextual feedback that AI agents cannot yet replicate. For professionals, this means strategically deploying AI to handle repetitive checks while reserving human expertise for complex logic and architectural decisions.

Key Points
  • AI agents' code suggestions have a 'significantly lower' adoption rate than human reviewers', with over 50% of rejected suggestions being incorrect.
  • Adopted AI suggestions increase code complexity and code size more than adopted human suggestions, potentially impacting long-term maintainability.
  • Human reviewers provide 11.8% more feedback rounds on AI-generated code and contribute crucial contextual feedback on testing and understanding that AI lacks.

Why It Matters

For engineering teams, it validates a hybrid approach: use AI for scale, but rely on human expertise for final quality, context, and architectural decisions.