Open Source

Decreased Intelligence Density in DeepSeek V4 Pro

DeepSeek V4 Pro's 'intelligence density' drops, requiring 10x more tokens

Deep Dive

DeepSeek's latest model, V4 Pro, is facing scrutiny over a significant drop in 'intelligence density'—a measure of output quality per token. According to a Reddit analysis, V4 Pro (1.6 trillion parameters) is 2.5x larger than V3.2 (0.67T parameters), yet it requires substantially more tokens to achieve comparable results. The V3.2 paper explicitly acknowledged token efficiency as a challenge, noting that it required longer generation trajectories to match models like Gemini 3.0-Pro, and promised future work on optimizing reasoning chain efficiency. However, V4 Pro appears to have regressed: even its non-thinking mode uses more tokens than V3.2, and the model reportedly needs around 10x more tokens than GPT-5.4 or GPT-5.5 for similar performance. Assuming identical tokens per second (TPS), this translates to roughly 10x longer completion times for tasks.

This finding is particularly concerning for developers and enterprises relying on DeepSeek for cost-sensitive or latency-critical applications. The increased token consumption directly impacts operational costs and throughput, potentially making V4 Pro less competitive than its predecessors or rivals. While DeepSeek has not officially commented, the community is urging the company to address this regression in future updates. For now, users may need to weigh the benefits of V4 Pro's larger parameter count against its apparent inefficiency.

Key Points
  • DeepSeek V4 Pro (1.6T params) is 2.5x larger than V3.2 (0.67T), but token efficiency decreased.
  • V4 Pro requires ~10x more tokens than GPT-5.4/5.5 for similar output quality.
  • Even non-thinking mode in V4 Pro uses more tokens than V3.2, contradicting optimization goals.

Why It Matters

Higher token usage means slower responses and higher costs for DeepSeek V4 Pro users.