Viral Wire

DeepSeek V4 Model Launched with 1 Million-Token Context and Open Weights

With 1M token memory and 75% cheaper API, DeepSeek V4 challenges closed models.

Deep Dive

DeepSeek launched DeepSeek V4 with two versions: Pro (1.6 trillion total parameters) and Flash (284 billion total parameters). Both support a 1-million-token context window and open weights. According to DeepSeek’s API page, V4-Flash pricing starts at USD 0.14 per 1 million input tokens on a cache miss. The model uses mixture-of-experts, with only part active per response. NVIDIA’s technical overview explains that DeepSeek V4 uses hybrid attention—combining compression and selective attention—which NVIDIA says is designed to cut per-token inference FLOPs by 73% compared to DeepSeek-V3.2. The model is built for long-context tasks like coding, document analysis, and agentic workflows.

Key Points
  • DeepSeek V4 includes two models: Pro (1.6T total params, 49B active) and Flash (284B total, 13B active), both with 1M-token context.
  • Hybrid attention reduces inference FLOPs by 73% and KV cache memory by 90% compared to V3.2.
  • Pricing starts at $0.14/1M input tokens for Flash; Pro has a temporary 75% discount, making long-context AI affordable at scale.

Why It Matters

Cost-effective, open-weight long-context AI enables enterprises to deploy complex agents and analysis without vendor lock-in.