DeepSeek makes 75% V4-Pro price cut permanent, escalating inference price war
DeepSeek V4-Pro now costs 11x less than Claude Opus 4.7—builders must reroute.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
DeepSeek has permanently committed to a 75% price reduction for its V4-Pro API, bringing the cost to $0.435 per million input tokens and $0.87 per million output tokens. This frontier-class 1.6-trillion-parameter mixture-of-experts model with a 1-million-token context window is now roughly 11 times cheaper than Anthropic's Claude Opus 4.7 for comparable coding scores. The strategic move signals DeepSeek's bet on hardware sovereignty via Huawei's Ascend 950 supernodes expected in volume by H2 2026, allowing them to sustain this pricing floor. For builders, the routing calculus has shifted: DeepSeek becomes the default for agentic and batch workloads not constrained by data residency, putting real pressure on Anthropic and OpenAI to either cut Sonnet/GPT-5.5 Mini pricing or demonstrate stronger capability advantages.
The pricing shock coincides with a flurry of product launches. At Google I/O 2026, Gemini 3.5 Flash emerged as the new default value pick at $1.50/$9 per million tokens with 1M context, achieving 76.2% on Terminal-Bench 2.1—outperforming Gemini 3.1 Pro on agentic evals. Google also introduced Omni video model, Spark agents, and a Managed Agents API that hands builders a remote Linux box per API call, making agent infrastructure table stakes. Meanwhile, Cursor's Composer 2.5, built on a Kimi K2.5 base with extensive RL post-training, scored third on the Coding Agent Index at 10-60x lower cost than rivals. Alibaba's Qwen3.7-Max ships with 35-hour autonomy and 1M context but remains API-only at $2.50/$7.50 per M. The Chinese labs' release cadence has compressed to weekly updates, forcing Western teams to design routing logic that can hot-swap models without re-evals every Friday.
- DeepSeek V4-Pro permanently priced at $0.435/$0.87 per M tokens—11x cheaper than Claude Opus 4.7.
- Google I/O 2026 launched Gemini 3.5 Flash ($1.50/$9, 1M context, 76% Terminal-Bench) and Managed Agents API.
- Cursor Composer 2.5 and Qwen3.7-Max further lower the cost of agentic coding and long-horizon tasks.
Why It Matters
The inference price war reshapes AI economics—now agents and batch workloads can scale without breaking budgets.