llama.cpp b9099 updates HTTP library, expands platform support
Latest release adds cpp-httplib 0.43.4 and new build targets for ARM and CUDA...
The latest release of llama.cpp, tagged b9099, focuses on infrastructure improvements rather than new features. The primary change is an update of the bundled cpp-httplib library from an earlier version to 0.43.4, which brings bug fixes and security patches for the HTTP server component used to interact with LLM inference endpoints. This is a routine maintenance update for the widely-used C++ inference engine for Llama-family models.
More notably, the release significantly expands the matrix of prebuilt binaries. Developers can now download ready-to-run executables for macOS Apple Silicon (both with and without KleidiAI acceleration), macOS Intel x64, Linux on x64 and ARM64 (CPU and Vulkan), Linux s390x, Windows x64 and ARM64 (CPU, Vulkan, CUDA 12/13, SYCL, HIP), and Android ARM64. This reduces the friction for users on newer Apple Silicon M-series chips and Windows ARM devices (like Surface Pro X) to run local LLMs without manual compilation. The build system also includes support for openEuler with ACL optimizations for Huawei Ascend processors.
- Updated internal cpp-httplib to v0.43.4, improving HTTP server reliability and security for local LLM serving.
- Added prebuilt macOS binaries with KleidiAI acceleration for Apple Silicon and separate builds for Intel x64.
- Expanded Windows support: now includes ARM64 CPU, dual CUDA versions (12 and 13), and Vulkan/SYCL/HIP builds.
Why It Matters
Makes local LLM deployment easier across diverse hardware – from Apple Silicon to Windows ARM – with a critical HTTP library update.