llama.cpp b9441 drops with fix for MSVC ETag truncation
The 114k-star open-source LLM inference engine gets a minor but important cross-platform update.
llama.cpp, the wildly popular open-source C++ inference engine for LLaMA-family models (114k GitHub stars), released its latest incremental update: b9441. The release primarily addresses a UI bug — ETag truncation when compiling with the MSVC (Microsoft Visual C++) compiler. While minor, this fix ensures proper caching behavior for users building from source on Windows, preventing redundant downloads.
The b9441 build matrix is extensive, covering nearly every major platform and acceleration backend. macOS users get both Apple Silicon (with optional KleidiAI support) and Intel x64 builds. Linux supports CPU, Vulkan, ROCm 7.2, OpenVINO, and SYCL FP32. Windows users can choose from CPU, Vulkan, CUDA 12.4, CUDA 13.3, and HIP. Android arm64 and iOS XCFramework builds are also included. Notably, some configurations are disabled (e.g., SYCL on Windows, openEuler builds). The release itself is signed with GitHub's verified signature, a security best practice.
- Fixes ETag truncation bug when using MSVC compiler, improving caching on Windows.
- Prebuilt binaries ship for macOS (Apple Silicon & Intel), Linux (CPU/Vulkan/ROCm/OpenVINO/SYCL), Windows (CPU/CUDA12/13/Vulkan/HIP), Android, and iOS.
- Release signed with GPG key B5690EEEBB952194 for verified authenticity.
Why It Matters
Small but meaningful reliability improvement for the dominant open-source tool used to run LLMs locally on any hardware.