Hermes analyzed 16 production APIs with 600 endpoints, uncovering 2,450 documentation and REST smells?

Hermes analyzed 16 production APIs with 600 endpoints, uncovering 2,450 documentation and REST smells.

The multi-agent LLM system detected deficiencies in all operations, despite APIs being structurally valid?

The multi-agent LLM system detected deficiencies in all operations, despite APIs being structurally valid.

Practitioner validation confirmed the smells but revealed trade-offs in remediation, leading to revised governance workflows?

Practitioner validation confirmed the smells but revealed trade-offs in remediation, leading to revised governance workflows.

Developer Tools

Hermes Multi-Agent System Detects 2,450 API Documentation Smells

arXiv cs.SE May 15, 2026

⚡Your OpenAPI docs might be valid but broken for AI agents—Hermes found 2,450 issues across 600 endpoints.

Deep Dive

As organizations rush to expose REST APIs as AI-agent tools via the Model Context Protocol (MCP), a new study reveals a critical blind spot: structurally valid OpenAPI documentation can be semantically broken for agent consumption. Researchers from a large industrial ecosystem tested an MCP integration across 16 production APIs comprising 600 endpoints. Early proof-of-concept experiments showed systematic failures in task planning, tool selection, and payload construction—failures initially blamed on model limits.

To diagnose the root cause, the team built Hermes, a multi-agent LLM system that automatically detects documentation and REST smells at the endpoint level. In a large-scale evaluation, Hermes uncovered 2,450 distinct smells, with deficiencies found in every single operation. Practitioner validation confirmed the high accuracy of detected issues, though it also revealed contextual trade-offs in deciding which smells to fix. The findings forced the organization to revise its adoption strategy, prioritizing selective endpoint adaptation, redefining documentation standards, and integrating automated documentation assessment into API governance. The paper, accepted at EASE 2026, underscores that artifact-level evaluation can reduce technological risk and guide evidence-based AI adoption.

Key Points

Hermes analyzed 16 production APIs with 600 endpoints, uncovering 2,450 documentation and REST smells.
The multi-agent LLM system detected deficiencies in all operations, despite APIs being structurally valid.
Practitioner validation confirmed the smells but revealed trade-offs in remediation, leading to revised governance workflows.

Why It Matters

Proves that API docs need semantic checks for agent readiness, not just structural validation.

Read Original Article

Hermes Multi-Agent System Detects 2,450 API Documentation Smells

Why It Matters

Related Articles

Stay Ahead in AI