b8753
The latest release adds official Gemma 4 template alignment and broadens hardware support across Windows, Linux, and macOS.
The open-source project llama.cpp, maintained by ggml-org, has released a significant update with version b8753. This release primarily focuses on improving compatibility with Google's latest Gemma 4 model family by better aligning with its official model template, ensuring more accurate and efficient execution. This is a crucial step for developers who want to leverage the capabilities of Gemma 4—a powerful, open-weight model from Google—using the lightweight and performant llama.cpp inference engine on local machines.
Beyond model support, the update dramatically expands the range of supported hardware and compute backends. New pre-built binaries now include support for Windows with HIP and SYCL backends, Linux with OpenVINO and updated ROCm 7.2 support, and macOS with a KleidiAI-enabled build for Apple Silicon. This broadens the accessibility of high-performance local AI inference, allowing users to run models on everything from NVIDIA and AMD GPUs to Intel integrated graphics and Apple's Neural Engine, making advanced AI more deployable across diverse environments.
- Adds official template alignment for Google's Gemma 4 model, improving inference accuracy and performance.
- Expands GPU backend support with new Windows (HIP, SYCL), Linux (OpenVINO, ROCm 7.2), and macOS (KleidiAI) binaries.
- Enhances cross-platform accessibility, allowing models to run efficiently on a wider array of consumer and server hardware.
Why It Matters
This update lowers the barrier to running state-of-the-art models like Gemma 4 locally, enabling more developers to build and test AI applications without cloud dependencies.