Developer Tools

llama.cpp now builds on 20+ platforms from Apple Silicon to Ascend NPUs

One CMake fix unlocks AI inference across macOS, Linux, Windows, Android, and openEuler…

Deep Dive

The latest commit (b9285) to the llama.cpp repository introduces a targeted CMake change: the router app is now built exclusively during standalone builds, preventing unwanted compilation when llama.cpp is integrated as a subproject. This small fix is part of a much larger update that officially expands the project’s build matrix to over 20 distinct platform/accelerator combinations.

The supported configurations now span every major OS and architecture. On macOS, both Intel and Apple Silicon (including the KleidiAI-accelerated variant) are covered. Linux users get CPU-only builds for x64, arm64, and s390x, plus Vulkan, ROCm 7.2, OpenVINO, and SYCL (FP32 and FP16). Windows adds CUDA 12 and 13 DLLs, HIP, and SYCL. Mobile support includes iOS XCFramework and Android arm64. Perhaps most notably, the openEuler Linux distribution is now supported with Ascend 310P and 910B NPUs using ACL Graph. This comprehensive support solidifies llama.cpp as the most portable framework for running large language models locally.

Key Points
  • Commit #23521 restricts router app build to standalone mode only, fixing integration issues.
  • Build matrix now includes 20+ configurations: Apple Silicon (KleidiAI), Intel, CUDA 12/13, Vulkan, ROCm 7.2, OpenVINO, SYCL, HIP, and Ascend NPUs.
  • New platform support covers iOS XCFramework, Android arm64, and openEuler with Ascend 310P/910B.

Why It Matters

Developers can now deploy LLMs on any hardware, from personal laptops to edge devices and data-center NPUs.