Developer Tools

llama.cpp b9824 renames binaries, adds broader platform support

The latest llama.cpp release cleans up binary names and expands build targets across CPU, GPU, and mobile.

Deep Dive

The llama.cpp project released version b9824, a minor but practical update focused on binary naming and platform coverage. Two key renames were made: `rpc-server` is now `ggml-rpc-server` to reduce the risk of name collisions in `/usr/bin` (since it works with any ggml application), and `export-graph-ops` now carries the `-test` suffix, following testing conventions. These changes make the binary suite more predictable for system integrators and CI pipelines.

Beyond naming, b9824 expands the prebuilt binary matrix significantly. For macOS, Apple Silicon builds include an optional KleidiAI-enabled variant. Linux users get builds for x64, arm64, and s390x with backends such as Vulkan, ROCm 7.2, OpenVINO, and SYCL (FP32/FP16). Windows binaries cover CPU, arm64 CPU, CUDA 12/13, Vulkan, OpenVINO, SYCL, and HIP. Android developers gain arm64 CPU and OpenCL Adreno builds. This comprehensive set ensures developers can run llama.cpp on virtually any modern hardware without compiling from source.

Key Points
  • Renamed `rpc-server` to `ggml-rpc-server` to avoid conflicts with system binaries
  • Added KleidiAI-enabled macOS Apple Silicon build for optimized inference
  • Prebuilt binaries now cover Android arm64 (CPU + OpenCL Adreno) and Windows arm64 (CPU)

Why It Matters

Cleaner naming and broader prebuilt binaries lower friction for deploying local LLMs across diverse hardware.

📬 Get the top 10 AI stories daily