Developer Tools

llama.cpp releases b9816 with multi-platform LLM support

⚑New release supports macOS, Linux, Windows, Android, and more platforms.

Deep Dive

The llama.cpp project, maintained by ggml-org, has released version b9816. This is the latest update to the popular open-source C/C++ implementation of LLaMA (Large Language Model) inference. The release focuses on expanding platform support, making it easier for developers and enthusiasts to run LLMs on a wide range of hardware without relying on cloud services. The build matrix now includes macOS on both Apple Silicon (arm64) and Intel (x64), iOS as an XCFramework, Linux across x64, arm64, and s390x architectures, and Windows on x64 and arm64. Additionally, Android arm64 is supported.

Backend-acceleration options are significantly broadened with this release. Users can leverage GPU backends like Vulkan, ROCm (for AMD GPUs), OpenVINO (for Intel hardware), SYCL, and HIP. CUDA support spans both CUDA 12 and CUDA 13 DLLs. For Windows, there’s also OpenCL support for Adreno GPUs. The release notes also mention a disabled openEuler option. While specific performance or feature improvements are not detailed in the release text, the expanded platform and backend coverage is the headline – enabling more users to run Llama models locally with better hardware utilization.

Key Points
  • Supports macOS (Apple Silicon & Intel), iOS, Linux (x64, arm64, s390x), Windows (x64, arm64), and Android arm64.
  • GPU backends: Vulkan, CUDA 12/13, ROCm, OpenVINO, SYCL, HIP, and OpenCL Adreno.
  • Build includes openEuler (disabled) and various optimization targets (KleidiAI disabled on macOS/iOS).

Why It Matters

Broader hardware support enables more users to run LLMs locally, improving privacy and reducing cloud costs.

πŸ“¬ Get the top 10 AI stories daily