b8783
The latest update to the popular open-source inference engine now supports Google's Gemma 4 model across 27 different hardware configurations.
The open-source community behind llama.cpp has released version b8783, a significant update to the popular C++ inference engine that enables efficient local AI model execution. This release primarily focuses on adding support for Google's recently announced Gemma 4 model, while also addressing parsing edge cases that previously caused issues with certain model configurations. The update represents ongoing maintenance for the project that has become essential infrastructure for running large language models locally.
The release includes 27 different pre-built binaries covering an extensive range of hardware platforms and acceleration backends. For macOS users, there are builds for both Apple Silicon (arm64) and Intel (x64) architectures, including a special KleidiAI-enabled version for enhanced performance. Linux users get options for CPU-only execution as well as GPU acceleration through Vulkan, ROCm 7.2, and OpenVINO. Windows support spans from basic CPU builds to specialized versions for CUDA 12.4, CUDA 13.1, Vulkan, SYCL, and HIP backends, plus builds for the openEuler operating system with Huawei Ascend NPU support.
This broad compatibility matrix demonstrates llama.cpp's role as a universal inference engine that bridges the gap between cutting-edge AI models and diverse hardware ecosystems. The project continues to evolve as a critical piece of infrastructure for developers who need to deploy AI models across different environments without being locked into specific cloud services or proprietary frameworks.
- Adds support for Google's Gemma 4 model with improved parsing edge case handling
- Provides 27 different build configurations across macOS, Linux, Windows, and openEuler platforms
- Includes specialized builds for CUDA 12.4/13.1, Vulkan, ROCm 7.2, OpenVINO, SYCL, and HIP acceleration backends
Why It Matters
Enables developers to run state-of-the-art models like Gemma 4 locally across diverse hardware, reducing dependency on cloud APIs and proprietary frameworks.