MIT Study: AI Scaffolding Can Boost Performance 100x
Scaffolding predicts price-performance better than the model itself in new research.
A new study from MIT FutureTech researchers (Hans Gundlach, Zachary Brown, Jayson Lynch, and Neil Thompson) reveals that scaffolding—the software environment that turns an AI model into an agent—can cause up to 100x variation in inference efficiency on benchmarks. Using data from the Holistic Agent Leaderboard (HAL), the researchers found that scaffolds explain more of the variation in price-performance than the models themselves. This challenges the common assumption that the model alone determines capability. The study also highlights that scaffold effectiveness is task- and model-dependent: some models benefit greatly from a given scaffold, while others see little gain or even degradation.
The findings have significant implications for AI evaluation and the agent economy. If scaffolds matter as much as models, then benchmarks must control for scaffolding to be meaningful. The researchers speculate that scaffold-model interactions could drive increased industry concentration, as companies that own both models and optimized scaffolds gain a structural advantage. For AI professionals, this means that choosing the right scaffold for a specific model-task combination could be as important as selecting the model itself—a shift that may reshape deployment strategies and competitive dynamics.
- Scaffolding can cause up to 100x variation in inference efficiency across benchmarks, per HAL data.
- Scaffolds explain more of the price-performance variation than the AI model itself in the study.
- Scaffold effectiveness is highly context-dependent: same scaffold may help or hinder different models on different tasks.
Why It Matters
Professionals must treat scaffolding as a strategic variable—it can matter as much as the model for cost and performance.