Developer Tools

b9015

New release of llama.cpp cleans up Vulkan, adds macOS Intel and Linux s390x builds

Deep Dive

ggml-org has released llama.cpp b9015, a minor but impactful update to the popular open-source C++ library for running large language models locally. The release removes the dead GGML_VK_MAX_NODES definition from Vulkan code, cleaning up the codebase and reducing potential confusion. This change is part of ongoing efforts to optimize and modernize the Vulkan backend, which is crucial for running LLMs on a wide range of GPUs without proprietary dependencies.

Beyond the Vulkan cleanup, this release significantly expands build support across operating systems and hardware. New prebuilt binaries are available for macOS Intel (x64), Linux s390x, Windows with CUDA 12 and 13, and openEuler (both x86 and aarch64 with ACL Graph). Existing builds for macOS Apple Silicon, Linux (x64, arm64, Vulkan, ROCm, SYCL), Android arm64, and other configurations are also continued. This breadth of platform coverage makes llama.cpp one of the most versatile tools for running AI models locally, whether on consumer desktops, servers, or edge devices.

Key Points
  • Removed dead GGML_VK_MAX_NODES definition to clean up Vulkan backend code
  • Added new build targets: macOS Intel (x64), Linux s390x, Windows CUDA 12 and 13, openEuler x86 & aarch64
  • Maintains existing support for Apple Silicon, Linux, Android, Windows CPU, and GPU backends (Vulkan, ROCm, SYCL)

Why It Matters

Local AI inference becomes more accessible across diverse hardware, from legacy CPUs to modern GPUs.