KVCache.ai's open-source calculator measures LLM memory costs per token
New tool supports DeepSeek V4 Flash, Qwen3, GLM, Kimi, MiniMax
Deep Dive
KVCache.ai has introduced an open-source, web-based KV Cache Size Calculator, a new tool for analyzing the GPU and RAM footprint of large language models. This calculator supports various models including DeepSeek V4 Flash, Qwen3, GLM, Kimi, and MiniMax. It allows users to assess per-token costs and optimize setups for different models and configurations.
Key Points
- Supports DeepSeek V4 Flash, Qwen3, GLM, Kimi, and MiniMax with configurable batch/sequence parameters.
- Calculates per-token KV cache memory footprint to optimize GPU/RAM allocation for inference.
- Open-source web tool enables developers to compare hardware costs and avoid OOM errors before deployment.
Why It Matters
Helps developers cut GPU costs by precisely sizing memory for diverse LLMs before deployment.