llama.cpp b9165 releases with cross-platform builds and archive fix
New llama.cpp release b9165 fixes release archive transform and adds multiple platform support
The llama.cpp open-source project, known for enabling local LLM inference, has released version b9165. This minor release focuses on a CI fix for the transform of the top entry in the release archive (issue #23080). The release provides a wide array of prebuilt binaries for various hardware backends, including CPU-only builds, GPU acceleration via CUDA, Vulkan, ROCm, OpenVINO, SYCL, and HIP. Supported platforms span macOS (Apple Silicon arm64, Intel x64, iOS), Linux (Ubuntu x64, arm64, s390x with multiple backends), Windows (x64 and arm64 CPU, plus CUDA 12/13, Vulkan, SYCL, HIP), Android arm64, and openEuler (x86 and aarch64 with ACL Graph).
This release is significant for developers and users who rely on llama.cpp for running large language models locally without cloud dependencies. The fix ensures that release archives are correctly structured, preventing potential issues during extraction and installation. By offering optimized builds for diverse hardware—including KleidiAI for Apple Silicon and CUDA 12/13 DLLs for NVIDIA GPUs—the project continues to democratize AI inference. Users can now more reliably deploy LLMs on their own machines, whether for development, research, or privacy-sensitive applications.
- Fixes release archive transform bug (#23080) to improve packaging reliability.
- Provides prebuilt binaries for 30+ platform/backend combinations including CPU, CUDA, Vulkan, ROCm, OpenVINO, SYCL, HIP.
- Supports macOS Apple Silicon (with KleidiAI), Intel, iOS; Linux Ubuntu x64/arm64/s390x; Windows x64/arm64 with multiple GPU backends; Android arm64; openEuler.
Why It Matters
Ensures stable local LLM deployment across diverse hardware, reinforcing llama.cpp's role in private AI inference.