Developer Tools

llama.cpp b9374 expands local LLM support across OS and hardware

114k-star project adds CUDA 13, Vulkan, ROCm, and ARM AI acceleration…

Deep Dive

The llama.cpp project, with 114k GitHub stars and 18.9k forks, released b9374 with extensive build improvements and expanded platform support. The update refactors CI workflows to separate CUDA, HIP, and Apple builds, adds caching prefixes, and fixes concurrency issues. Key additions include macOS Apple Silicon with KleidiAI acceleration, Ubuntu support for x64, ARM64, s390x with Vulkan, ROCm 7.2, OpenVINO, and SYCL (FP32 disabled). Windows gains x64/ARM64 CPU builds, CUDA 12 and 13 DLLs, Vulkan, and HIP. Android ARM64 and iOS XCFramework are also included.

The release underscores llama.cpp's role as the go-to local LLM runtime, now supporting virtually every major GPU backend. The CI overhaul ensures faster, more reliable builds for contributors. With KleidiAI on Apple Silicon and CUDA 13 on Windows, developers can run cutting-edge models on both consumer and enterprise hardware. This continues the project's mission to democratize AI inference without cloud dependencies.

Key Points
  • llama.cpp b9374 supports macOS Apple Silicon (KleidiAI), Linux (x64/ARM64/s390x), Windows (CPU/CUDA/Vulkan/HIP), and Android.
  • CI refactored into separate workflows for CUDA, HIP, and Apple with prefix caching and fixed concurrency.
  • New CUDA 13 DLLs added for Windows; Ubuntu gains ROCm 7.2, OpenVINO, and SYCL FP32 (disabled) targets.

Why It Matters

Enables developers to run local LLMs on nearly any device, reducing cloud costs and latency.