DeepSeek Releases V4 Open-Source AI Model with 1 Million-Token Context and Mixture-of-Experts Architecture
1.6 trillion parameters, only 49B active, and 1M token context – for free.
DeepSeek unveiled V4 on April 24, offering two open-weight MoE variants. The flagship deepseek-v4-pro packs 1.6 trillion total parameters with only 49 billion active per token, while the more efficient deepseek-v4-flash uses 284 billion total (13B active). Both default to a massive 1,000,000-token context window, enabled by DeepSeek's Sparse Attention and token-wise compression techniques. The models are immediately available via DeepSeek's public API, with pricing and throughput metrics already being compared by third parties.
Media reception is split. The New York Times frames the open release as a potential soft-power advantage for China, while MIT Technology Review emphasizes the practical gains in long-context handling and lower inference cost. The Economist, however, describes the launch as failing to match the disruptive impact of earlier DeepSeek releases. Meanwhile, NIST's Center for AI Standards and Innovation (CAISI) has already evaluated the Pro variant, signaling that standards bodies are adapting quickly to open frontier models. For practitioners, the real test will come from independent benchmarks and community stress tests on Hugging Face.
- Two MoE variants: Pro with 1.6T total/49B active parameters and Flash with 284B total/13B active
- Default 1,000,000-token context window enabled by DeepSeek Sparse Attention
- Open-weight release available via API; NIST's CAISI has already published an evaluation
Why It Matters
This open-weight, 1M-context model lowers barriers for startups and researchers while prompting faster safety auditing.