Developer Tools

b8781

llama.cpp Releases April 14, 2026

⚡The latest commit enables native chat parsing for DeepSeek's 671B parameter model across CPU, GPU, and mobile.

Deep Dive

The ggml-org team behind the popular llama.cpp project has released a significant update with commit b8781, adding native support for DeepSeek V3.2, one of the largest open-source language models available. This update introduces a dedicated parser and official chat template specifically optimized for DeepSeek's 671B parameter model, enabling developers to run this massive model efficiently on consumer hardware. The commit represents a major step in making state-of-the-art AI models more accessible outside of cloud environments.

What makes this release particularly notable is the extensive platform coverage included in the 27 different build assets. The update provides specialized builds for macOS Apple Silicon (with optional KleidiAI acceleration), Intel macOS, iOS frameworks, multiple Linux distributions with CPU, Vulkan, and ROCm 7.2 support, Windows builds with CUDA 12/13, Vulkan, SYCL, and HIP backends, plus openEuler builds optimized for Huawei's Ascend 310P and 910B AI processors. This comprehensive coverage ensures developers can deploy DeepSeek V3.2 across virtually any hardware stack.

The technical implementation includes verified GitHub signatures (GPG key ID: B5690EEEBB952194) and addresses community request #21785 for proper DeepSeek support. The parser optimizations specifically handle DeepSeek's unique formatting requirements, while the official template ensures compatibility with the model's expected input structure. This release follows the growing trend of making massive models like DeepSeek's 671B parameter architecture runnable on local hardware rather than requiring expensive cloud API calls.

Key Points

Adds dedicated parser and official chat template for DeepSeek V3.2 (671B parameters)
Includes 27 platform-specific builds covering macOS, iOS, Linux, Windows, and openEuler
Supports multiple backends including CPU, CUDA 12/13, Vulkan, ROCm 7.2, SYCL, and HIP

Why It Matters

Enables running one of the world's largest open models locally, reducing cloud dependency and giving developers more control over AI inference.

Read Original Article

b8781

Why It Matters

Stay Ahead in AI