Opinion & Analysis

MIT, BCG & METR studies show AI productivity gains are real but radically uneven

Novices gain 34% while experts get 19% slower with AI tools

Deep Dive

Three years into the AI productivity promise, hard evidence now paints a nuanced picture. The most cited studies — Brynjolfsson et al.'s analysis of 5,179 support agents, Dell'Acqua et al.'s BCG/HBS experiment, METR's randomized developer trial, and MIT NANDA's enterprise survey — converge on a single finding: AI gains are real but radically uneven. Novices outperform experts: support agents with least experience gained 34% productivity, while veterans barely moved. The AI effectively encoded tacit knowledge from top performers and handed it to newcomers, compressing months of learning into a prompt. BCG consultants using GPT-4 completed 12.2% more tasks 25% faster when the task fell inside AI's 'jagged frontier' — where the model excels. But on a seemingly similar task outside that frontier, AI-assisted consultants were 19% less likely to be correct, illustrating the invisible boundary between AI competence and failure. The most troubling result comes from METR: 16 experienced open-source developers, working on their own repositories, actually took 19% longer with AI tools, while believing they were 20% faster. They could not perceive the productivity tax.

At enterprise scale, the picture is equally stark. MIT's NANDA study found 95% of GenAI pilots produced no measurable P&L impact. The bottleneck wasn't model quality but the 'learning gap' — organizations kept unchanged workflows. Bought-from-vendor projects succeeded 67% of the time, internal builds barely a third as often. Real ROI clustered in unglamorous back-office automation, not the sales-and-marketing tools that consumed most budgets. Together, these findings redefine productivity not as a property of AI itself, but of the match between task, tool, and the human's ability to detect when the AI is wrong. The map is now clear: AI levels up the bottom, not the top, and enterprises must redesign workflows, not just layer on AI, to capture value.

Key Points
  • Customer support novices gained 34% productivity vs. near-zero for veterans (Brynjolfsson study of 5,179 agents)
  • BCG consultants using GPT-4 were 19% less likely to be correct on tasks outside AI's 'jagged frontier'
  • Experienced developers took 19% longer with AI tools but believed they were 20% faster (METR trial)

Why It Matters

AI productivity gains flow to novices, not experts, and 95% of enterprise pilots fail — requiring workflow redesign, not tool addition.

📬 Get the top 10 AI stories daily