Open Source

Deepseek has released DeepEP V2 and TileKernels.

New open-source libraries boost GPU efficiency by 2x for large models.

Deep Dive

DeepSeek, the Chinese AI research lab, has open-sourced two new libraries on GitHub: DeepEP V2 and TileKernels. DeepEP V2 is an update to their expert parallelism library, designed to optimize communication between GPUs for Mixture-of-Experts (MoE) models. The new version reduces all-to-all communication latency by up to 40%, enabling more efficient scaling across hundreds of GPUs. This is critical for models like DeepSeek's own V3, which uses MoE to activate only a subset of parameters per token.

TileKernels, meanwhile, provides custom GPU kernels that optimize dense and sparse matrix operations. Early benchmarks show up to 2x speedup on common operations like matrix multiplication and attention, compared to standard CUDA implementations. Both libraries are freely available on GitHub, allowing developers to integrate them into their own AI pipelines. The releases underscore DeepSeek's strategy of contributing foundational infrastructure to the open-source community while advancing their own model capabilities.

Key Points
  • DeepEP V2 reduces all-to-all communication latency for MoE models by up to 40%.
  • TileKernels provides custom GPU kernels with up to 2x speedup on matrix operations.
  • Both libraries are open-source on GitHub, enabling integration into any AI pipeline.

Why It Matters

These tools democratize high-performance AI infrastructure, enabling faster and cheaper inference for large-scale models.