Study: Only 2.3% of LLM agent skill specs give users clear expectations
New research reveals most agent skill descriptions lack examples and output contracts to protect users.
Deep Dive
A study by Zikai Alex Wen analyzed 878 cybersecurity LLM agent skill specifications for user comprehension. Cues for operational basis were common, but only 19.0% of specifications exhibited cues for an example task, sample, or expected outcome, and only 2.3% exhibited cues for all four comprehension anchors. The paper argues skill specs should serve as user-facing capability disclosures, not merely as containers for executable instructions.
Key Points
- Only 19% of 878 cybersecurity skill specs included example tasks or expected outcomes.
- Just 2.3% of specifications exhibited all four comprehension anchors (operational basis, output contract, boundary disclosure, example demonstration).
- Skills lacking examples forced users to inspect helper code, while example-rich specs made first local checks easier to construct.
Why It Matters
User safety in AI agent marketplaces depends on clear specs—this study highlights a dangerous gap in user comprehension supports.