Viral Wire

DeepSeek Launches V4 Large Language Model, Challenging US Rivals with Reduced Costs and 1M Token Context

FundaAI, Quasa.io, Hindustan Times, Wccftech April 29, 2026

⚡DeepSeek's V4 cuts API costs to ¥0.025/M tokens with 95%+ cache hits...

Deep Dive

DeepSeek has launched V4, a large language model that marks a significant inflection point in AI inference economics. Building on architectural customizations for non-NVIDIA hardware and an initial 75% API price reduction, DeepSeek V4 introduces a second round of aggressive cuts. The input cache-hit tier is now priced at just ¥0.025 per million tokens, which is 1/10 of the standard list price and stacked on top of the previous 75% discount. This widens the spread between cache hit and cache miss pricing from 1/12 to a staggering 1/120 (cache miss at ¥3 per million tokens). The key enabler is DeepSeek's compression of KV cache size to only 10% of V3.2's, combined with extensive engineering work on SSD-based KV cache. This shifts storage from expensive, capacity-limited DRAM/HBM to larger, cheaper NAND-based SSDs at scale, achieving a real-world cache hit rate of over 95% in agent settings.

For users, this translates to dramatically lower costs for high-frequency, agent-based workflows where context reuse is common. DeepSeek's improved SSD configuration and utilization have materially stepped up performance, making the model far more economical for enterprises deploying AI agents that require long context windows (up to 1 million tokens). The repricing also signals a broader shift in AI infrastructure, with NAND demand set to grow exponentially as more models adopt SSD-based caching. This could pressure competitors like OpenAI and Anthropic to rethink their own cost structures, potentially accelerating the commoditization of large language model inference.

Key Points

DeepSeek V4 compresses KV cache to 10% of V3.2's size, enabling SSD-based storage
Cache-hit API pricing drops to ¥0.025 per million tokens (1/120 of cache miss rate)
Achieves 95%+ cache hit rate in real-world agent settings with 1M token context

Why It Matters

DeepSeek V4's SSD-driven cost cuts could force AI rivals to slash prices, democratizing access to long-context agents.

Read Original Article

DeepSeek Launches V4 Large Language Model, Challenging US Rivals with Reduced Costs and 1M Token Context

Why It Matters

Stay Ahead in AI