Viral Wire

Chinese AI Firms Adopt Hybrid Strategy: Balancing Open-Source Releases with Closed-Source Models for Efficiency

Forbes / Startup Fortune / South China Morning Post April 26, 2026

⚡1.6T parameter MoE model compresses memory, cutting inference costs for million-token tasks.

Deep Dive

DeepSeek V4 arrives amid fierce competition from OpenAI's GPT-5.5 and Anthropic's Opus 4.7, but distinguishes itself with a focus on efficiency over raw scale. The preview includes two Mixture-of-Experts models: DeepSeek-V4-Pro (1.6T total parameters, 49B activated) and DeepSeek-V4-Flash (284B total, 13B activated), both supporting a 1M-token context window. The core innovation is a hybrid attention design that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA). CSA compresses groups of key-value entries and selects relevant blocks, while HCA compresses more aggressively for dense attention over a shorter memory stream. This addresses the bottleneck of long-context AI, where every new token may need to reference growing histories of documents, code, or tool calls.

By treating long context as a memory hierarchy problem, DeepSeek V4 reduces inference costs significantly, making million-token reasoning practical for more developers. This lowers the barrier for startups to build agents that analyze full code repositories, legal records, or multi-document filings. The technical report also suggests future hardware should optimize for computation-to-communication ratios, and V4 has been adapted for Huawei's Ascend chips. The economic impact is clear: cheaper long-context AI expands the market, enabling ambitious applications that were previously too expensive for enterprises and open-source developers alike.

Key Points

DeepSeek V4 Pro has 1.6T total parameters (49B activated) with a 1M-token context window using MoE architecture.
Hybrid attention design (CSA + HCA) compresses memory, reducing compute and cost for long-context reasoning.
Model optimized for Huawei Ascend 950 chips, with hardware suggestions for computation-to-communication ratios.

Why It Matters

Lower inference costs democratize long-context AI, enabling startups and enterprises to build ambitious agents and tools.

Read Original Article

Chinese AI Firms Adopt Hybrid Strategy: Balancing Open-Source Releases with Closed-Source Models for Efficiency

Why It Matters

Stay Ahead in AI