Research & Papers

Study: Only 2.3% of LLM agent skill specs give users clear expectations

New research reveals most agent skill descriptions lack examples and output contracts to protect users.

Deep Dive

A study by Zikai Alex Wen analyzed 878 cybersecurity LLM agent skill specifications for user comprehension. Cues for operational basis were common, but only 19.0% of specifications exhibited cues for an example task, sample, or expected outcome, and only 2.3% exhibited cues for all four comprehension anchors. The paper argues skill specs should serve as user-facing capability disclosures, not merely as containers for executable instructions.

Key Points
  • Only 19% of 878 cybersecurity skill specs included example tasks or expected outcomes.
  • Just 2.3% of specifications exhibited all four comprehension anchors (operational basis, output contract, boundary disclosure, example demonstration).
  • Skills lacking examples forced users to inspect helper code, while example-rich specs made first local checks easier to construct.

Why It Matters

User safety in AI agent marketplaces depends on clear specs—this study highlights a dangerous gap in user comprehension supports.