DeepSeek releases V4 Pro DSpark: faster inference with sparse attention
New open-source model achieves 3x speedup using dynamic sparse computation...
Deep Dive
DeepSeek AI has released DeepSeek-V4-Pro-DSpark, an open-source large language model that uses a novel dynamic sparse attention mechanism called DSpark. The model is available on HuggingFace alongside a research paper.
Key Points
- DeepSeek-V4-Pro-DSpark uses a dynamic sparse attention mechanism to prune 70% of attention heads during inference without accuracy loss.
- Achieves 2.5–3x faster inference on long contexts (32K-128K tokens) compared to the dense base model V4 Pro.
- Open-source release on HuggingFace with full paper and code; can run on a single A100 GPU with quantization.
Why It Matters
Makes frontier-class 400B+ models feasible on consumer hardware, drastically lowering inference costs for developers.