Research & Papers

Routing, Cascades, and User Choice for LLMs

Your AI's speed and quality could be secretly limited to save money.

Deep Dive

A new ICLR 2026 paper uses game theory to analyze how LLM providers route user tasks between cheaper 'standard' and more expensive 'reasoning' models. The research reveals a significant misalignment between provider and user goals. In most cases, optimal routing is a simple static policy. Alarmingly, the study shows providers are sometimes incentivized to intentionally throttle model latency to minimize their own costs, which directly depresses user utility and satisfaction.

Why It Matters

This exposes the hidden economic forces that could be degrading your AI experience right now.