EDRM reduces token consumption by 41–55% across 15 benchmarks and 4 LLMs, while improving accuracy by up to 4.7% using only per-token entropy signals?

EDRM reduces token consumption by 41–55% across 15 benchmarks and 4 LLMs, while improving accuracy by up to 4.7% using only per-token entropy signals.

Unlike competing approaches (DeepMind Adaptive CoT, Microsoft ACTS), EDRM is entirely training-free and requires no task-level classifiers or ensemble sampling?

Unlike competing approaches (DeepMind Adaptive CoT, Microsoft ACTS), EDRM is entirely training-free and requires no task-level classifiers or ensemble sampling.

The framework's reliance on entropy computation may offset savings on short sequences, and it has not been validated on GPT-4 or multilingual tasks, limiting immediate production readiness?

The framework's reliance on entropy computation may offset savings on short sequences, and it has not been validated on GPT-4 or multilingual tasks, limiting immediate production readiness.

Research & Papers

New EDRM Framework Cuts LLM Token Use 55% by Selective Reasoning

arXiv cs.LG May 25, 2026

⚡What if using less reasoning could make large language models both cheaper and more accurate? That's the counterintuitive promise of a new training-free framework that cuts token consumption by over half while modestly improving accuracy.

Deep Dive

A new paper from Wei Xia and colleagues tackles a fundamental question: when does chain-of-thought (CoT) reasoning actually help LLMs? The authors observe a paradox: CoT often provides marginal or even negative gains on factual and open-ended tasks while multiplying token consumption. By viewing LLM decoding as a dynamical system, they discover that early-stage entropy dynamics reliably signal whether a task benefits from reasoning. Tasks that profit from CoT exhibit consistent entropy reduction (a phase transition from high-entropy exploration to low-entropy structured reasoning), while others show unstable or increasing entropy patterns.

Based on this insight, the team proposes EDRM (Entropy Dynamics-based Reasoning Manifold), a lightweight, training-free routing framework. EDRM embeds early decoding entropy trajectories into a compact manifold, enabling zero-shot deployment and instance-level adaptation. Evaluated on 15 benchmarks across 4 LLMs of varying scales, EDRM reduces token consumption by 41–55% while maintaining or improving accuracy using as few as 50 calibration samples. At the instance level, it boosts accuracy by up to 4.7% while saving 27–45% tokens. This work suggests that reasoning is a dynamic decoding state best invoked selectively, not by default.

Key Points

EDRM reduces token consumption by 41–55% across 15 benchmarks and 4 LLMs, while improving accuracy by up to 4.7% using only per-token entropy signals.
Unlike competing approaches (DeepMind Adaptive CoT, Microsoft ACTS), EDRM is entirely training-free and requires no task-level classifiers or ensemble sampling.
The framework's reliance on entropy computation may offset savings on short sequences, and it has not been validated on GPT-4 or multilingual tasks, limiting immediate production readiness.

Why It Matters

Selective reasoning will unlock cost-efficient LLM deployments, challenging the assumption that more chain-of-thought always yields better results.

Read Original Article

New EDRM Framework Cuts LLM Token Use 55% by Selective Reasoning

Why It Matters

Related Articles

🚀 Stay Ahead in AI