Pay-per-token LLM pricing lets providers overcharge users, ICML paper shows
A new algorithm inflates token counts undetected—and it's profitable for providers.
A new paper selected for ICML 2026 exposes a fundamental flaw in the dominant pay-per-token pricing model for LLM APIs. Researchers Ander Artola Velasco, Stratis Tsirtsis, Nastaran Okati, and Manuel Gomez-Rodriguez show that providers have a financial incentive to deliberately misreport the number of tokens a model uses to generate an output, and users cannot verify whether they are being overcharged. As a proof of concept, the team developed an efficient heuristic algorithm that allows providers to significantly inflate token counts without raising suspicion. Crucially, the algorithm's operational cost is lower than the additional revenue generated from overcharging, making it profitable for dishonest providers. The experiments used models from the Llama, Gemma, and Ministral families with prompts from the LMSYS Chatbot Arena platform.
To eliminate this vulnerability, the researchers propose a simple alternative: price tokens linearly based on their character count rather than the model's internal token representation. While this changes the provider's profit margin per token, they introduce a prescription that lets providers maintain their average profit margin under the new scheme. This incentive-compatible mechanism removes the financial motivation to misreport, as overcharging is no longer possible. The paper highlights a critical blind spot in current LLM pricing and offers a practical fix that could save users significant costs. Selected as an oral presentation at ICML 2026, the work has implications for every company and developer using paid LLM APIs.
- Pay-per-token pricing gives LLM providers a financial incentive to misreport token counts, and users cannot detect it.
- Researchers developed a heuristic algorithm that inflates tokens without raising suspicion, with cost lower than extra revenue.
- The fix: price tokens linearly by character count, which eliminates the overcharging incentive while preserving average profit margins.
Why It Matters
Exposes a hidden cost in LLM APIs and offers a simple pricing fix to protect users.