Scaling Laws for Skills in LLM Agents: Routing Accuracy Soars to 91.7%
New research across 15 LLMs and 1,141 skills reveals logarithmic decay and 4x execution gains.
Researchers from multiple institutions have published a comprehensive study on the scaling laws of skills in LLM agent systems. Across 15 frontier LLMs, 1,141 real-world skills, and over 3 million routing or execution decisions, they identify two coupled laws. The routing law shows that single-step routing accuracy decays logarithmically with library size (R² > 0.97 for all models), with errors progressing from local skill competition to cross-family drift and capture by overly general "black-hole skills." The execution law reveals that before state realization, joint routing is approximately multiplicative, whereas correct execution can improve difficult downstream decisions by about 4x. A single parameter, the routing logarithmic decay slope, couples the two laws, showing that the same library property controls both pre-execution collapse and downstream recoverability.
The laws are not just descriptive but actionable. The authors applied law-guided optimization to a held-out set, raising routing accuracy from 71.3% to 91.7% and reducing hijack from 22.4% to 4.1%. These improvements transferred directionally to downstream execution settings: on ClawBench, mean pass rate rose from 49.3% to 61.6%; on ClawMark, from 28.4% to 34.5%. The results demonstrate that agent performance depends not only on model capability, but also on the structure, granularity, and exposure policy of the skill library. This research provides a theoretical foundation for building more reliable and scalable LLM-based agent systems.
- Routing accuracy decays logarithmically with skill library size (R² > 0.97 across 15 LLMs).
- Correct execution improves difficult downstream decisions by about 4x.
- Law-guided optimization raised routing accuracy from 71.3% to 91.7% and reduced hijack from 22.4% to 4.1%.
Why It Matters
Agent performance hinges on library structure, not just model capability—critical for scaling reliable AI systems.