Developer Tools

llama.cpp b9360 release: env fix and expanded platform support

The popular LLM inference engine gets cleaner env variable naming and builds for Android, ROCm, and more.

Deep Dive

The open-source llama.cpp project, known for running large language models efficiently on consumer hardware, has tagged version b9360. The key change is a fix in `common` code that renames all environment variables to use a consistent `LLAMA_ARG_` prefix (merge #23778). This reduces confusion and potential conflicts when configuring models. The release also expands platform coverage significantly.

Pre-built binaries are now available for macOS (Apple Silicon with optional KleidiAI acceleration, Intel x64), iOS XCFramework, Linux (x64, arm64, s390x) with Vulkan, ROCm 7.2, OpenVINO, and SYCL (FP32 disabled on x64), Windows (x64 and arm64) with CUDA 12/13, Vulkan, and HIP, plus Android arm64. Notably, builds for openEuler are marked as disabled. This broad support allows developers to deploy llama.cpp on diverse hardware, from edge devices to cloud GPUs.

Key Points
  • Environment variables now standardized with `LLAMA_ARG_` prefix to avoid naming conflicts.
  • New pre-built binaries for 15+ platforms including macOS, Windows, Linux (Vulkan/ROCm/OpenVINO), and Android.
  • Version tag b9360 includes CUDA 12 and 13 DLLs for Windows and experimental KleidiAI support on macOS.

Why It Matters

Simpler configuration and broader deployment options make llama.cpp more accessible for production AI inference across devices.