CMake now skips cvector-generator and export-lora when CPU backend is disabled (#24053)?

CMake now skips cvector-generator and export-lora when CPU backend is disabled (#24053)

KleidiAI acceleration is currently disabled on macOS Apple Silicon (arm64) builds?

KleidiAI acceleration is currently disabled on macOS Apple Silicon (arm64) builds

Supports multiple GPU backends?

CUDA 12/13, Vulkan, ROCm 7.2, OpenVINO, SYCL, HIP, plus CPU-only builds

Developer Tools

llama.cpp b9504 improves build system, adds platform fixes

llama.cpp Releases June 04, 2026

⚡115k-star project updates with cleaner CMake and disabled KleidiAI on Apple Silicon

Deep Dive

llama.cpp, the open-source C/C++ implementation for running large language models locally, has released version b9504. Maintained by the ggml-org team, this project has garnered over 115,000 GitHub stars and 19,200 forks, making it one of the most popular AI inference engines. The b9504 release focuses on build system refinements and platform-specific improvements.

The key change is a CMake update that now skips the cvector-generator and export-lora components when the CPU backend is disabled. This streamlines compilation for GPU-only setups (e.g., CUDA, Vulkan, ROCm) and reduces unnecessary dependencies. The release also notes that KleidiAI, a software library for optimised neural network inference, is currently disabled on macOS Apple Silicon (arm64). The full list of supported platforms includes Ubuntu (x64, arm64, s390x), Windows (x64, arm64), Android arm64, and macOS Intel/Apple Silicon. GPU backends span CUDA 12 and 13 (Windows), Vulkan, ROCm 7.2, OpenVINO, SYCL, and HIP.

For developers and AI enthusiasts, this version continues llama.cpp's tradition of efficient local inference. While no major new features are announced, the build improvements ensure smoother deployment across diverse hardware. Users can expect more reliable compilation for GPU-only environments and the usual high performance for models like Llama 3, Mistral, and Gemma.

Key Points

CMake now skips cvector-generator and export-lora when CPU backend is disabled (#24053)
KleidiAI acceleration is currently disabled on macOS Apple Silicon (arm64) builds
Supports multiple GPU backends: CUDA 12/13, Vulkan, ROCm 7.2, OpenVINO, SYCL, HIP, plus CPU-only builds

Why It Matters

llama.cpp b9504 keeps local AI inference accessible across AMD, Intel, NVIDIA, and Apple hardware, enabling privacy-friendly model deployment.

Read Original Article

llama.cpp b9504 improves build system, adds platform fixes

Why It Matters

Related Articles

🚀 Stay Ahead in AI