Semantic early-stopping uses cosine distance on embeddings to detect convergence, replacing blind iteration caps?

Semantic early-stopping uses cosine distance on embeddings to detect convergence, replacing blind iteration caps.

Judge-free variant cuts operational tokens by 38% on HotpotQA while maintaining answer quality (p=0.81)?

Judge-free variant cuts operational tokens by 38% on HotpotQA while maintaining answer quality (p=0.81).

Quality-gated variant is counter-productive; an oracle selecting the best round outperforms all practical policies by +0.115 Information Score?

Quality-gated variant is counter-productive; an oracle selecting the best round outperforms all practical policies by +0.115 Information Score.

Agent Frameworks

Semantic Early-Stopping Slashes 38% Tokens in LLM Agent Loops Without Quality Loss

arXiv cs.MA June 26, 2026

⚡Stop wasting tokens on easy problems—cut costs while keeping quality.

Deep Dive

Sahil Shrivastava's paper 'Semantic Early-Stopping for Iterative LLM Agent Loops' tackles a wasteful common practice: running multi-agent LLM loops (like a Writer drafting and a Critic revising) with a fixed iteration cap. This syntactic kill-switch overspends tokens on simple inputs and truncates difficult ones. The proposed solution halts the loop based on meaning: it tracks when consecutive draft embeddings stop changing (cosine distance with a patience window) and whether answer quality stops improving.

The paper makes three contributions. First, a rigorous theoretical foundation: deterministic termination is machine-checked, treating convergence as an empirically tested conjecture rather than a forced Banach contraction. Second, a judge-efficient evaluation protocol that replays all stopping policies over identical drafts and caches LLM-judge calls for low-cost paired comparisons. Third, empirical results on HotpotQA show a judge-free semantic stopper reduces operational tokens by 38% relative to max_iterations with no quality loss (Delta-IS = -0.004, p=0.81). Ironically, adding a quality gate makes things worse because per-round judging costs dominate. An oracle that picks the best round achieves +0.115 Information Score over any practical policy (p ~ 4e-11), suggesting the real challenge isn't when to stop but which round is best.

Key Points

Semantic early-stopping uses cosine distance on embeddings to detect convergence, replacing blind iteration caps.
Judge-free variant cuts operational tokens by 38% on HotpotQA while maintaining answer quality (p=0.81).
Quality-gated variant is counter-productive; an oracle selecting the best round outperforms all practical policies by +0.115 Information Score.

Why It Matters

Smarter stopping in AI agent loops could dramatically cut inference costs for complex multi-step tasks.

Read Original Article

Semantic Early-Stopping Slashes 38% Tokens in LLM Agent Loops Without Quality Loss

Why It Matters

Related Articles

🚀 Stay Ahead in AI