b9063
New release of the popular open-source LLM runtime improves OpenCL debugging capabilities...
llama.cpp, the leading open-source library for efficient LLM inference on consumer hardware, has released version b9063. With over 109k GitHub stars and 17.9k forks, this project enables running models like Llama, Mistral, and GPT-2 locally on a wide range of devices. The latest release introduces a new OpenCL opfilter regex feature for debugging, allowing developers to filter and inspect specific operations within OpenCL kernel execution, which is crucial for optimizing performance on GPU-based backends.
This release expands platform support significantly, offering pre-built binaries for macOS (Apple Silicon with and without KleidiAI, Intel x64, iOS XCFramework), Linux (x64, arm64, s390x for CPU, plus Vulkan, ROCm 7.2, OpenVINO, and SYCL FP32/FP16 variants), Windows (x64 and arm64 CPU, CUDA 12/13, Vulkan, SYCL, HIP), Android (arm64 CPU), and openEuler (x86 and aarch64 with Ascend 310p/910b ACL Graph). This breadth ensures developers can deploy local AI inference across diverse hardware setups, from laptops to servers.
- Adds OpenCL opfilter regex for debugging GPU kernel operations
- Project has 109k stars, 17.9k forks on GitHub
- Supports macOS, Linux, Windows, Android, and openEuler with multiple GPU backends (CUDA, Vulkan, ROCm, SYCL, HIP)
Why It Matters
Lets developers fine-tune OpenCL performance for local LLM inference across diverse hardware.