New study: LLM tool registries are biased by puffery, not facts
17,700+ trials show marketing fluff beats accuracy every time.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
A new research paper, "Agent-Facing Information Design in LLM Tool Registries" by Haochuan Kevin Wang, exposes a critical flaw in how AI agents choose tools. The author likens LLM tool registries to unregulated advertising platforms, where providers write free-text descriptions that agents use for selection—but no viewability standard, quality score, or outcome audit exists. Over 17,700 trials across five different LLMs and ten domains, Wang found that legal puffery (subjective superlatives like "best-in-class") accounted for 100% of the optimization effect, while outright fabricated claims added zero incremental bias. This renders FTC deceptive advertising rules ineffective against the active mechanism. Disclosure also fails structurally: system-prompt warnings produced zero measurable effect for four of five models, and behavioral ceilings leave no room for label-based correction. Superlatives emerged as the dominant single feature, with a SBC of +0.35.
Wang proposes a constructive remedy: registry-layer description normalization achieves first-best welfare model-independently. The concrete proposal is to separate selection-facing descriptions (structured, registry-controlled) from marketing-facing descriptions (provider-authored, shown post-selection). He also introduces an Agent Attention Quality Score to distinguish genuine capability from copywriting. This framework provides the first systematic measurement infrastructure for an otherwise opaque market, with implications for developers building agent-based systems, platform operators like OpenAI and Anthropic, and regulators concerned about AI safety and fairness. The paper is available on arXiv under cs.IR and cs.AI categories.
- Legal puffery (subjective superlatives) alone captures 100% of the optimization effect in LLM tool registries.
- Fabricated claims add zero incremental bias, meaning FTC enforcement is ineffective against the active mechanism.
- Proposal: separate selection-facing (registry-controlled) from marketing-facing descriptions, plus an Agent Attention Quality Score.
Why It Matters
AI agents will systematically prefer flashy marketing over real capability unless we redesign tool registries.