b8119
The popular open-source inference engine now supports Vulkan API, expanding GPU compatibility beyond CUDA.
The open-source community behind llama.cpp, the high-performance C++ inference engine for Meta's Llama models, has released version b8119. This significant update introduces Vulkan API support for Windows systems, allowing users to run models on AMD and Intel GPUs in addition to NVIDIA hardware via CUDA. The release addresses a Hexagon build issue (#19444) and provides comprehensive pre-built binaries across platforms: macOS (Apple Silicon/Intel), Linux (CPU/Vulkan), Windows (CPU/CUDA 12-13/Vulkan/SYCL/HIP), and openEuler with Huawei Ascend support. With 95.5k GitHub stars, llama.cpp enables efficient local AI inference on consumer hardware, and this update significantly expands hardware compatibility.
- Adds Vulkan GPU support for Windows, enabling AMD and Intel GPU acceleration alongside NVIDIA CUDA
- Fixes Hexagon build issues and provides pre-built binaries for 10+ platform/architecture combinations
- Maintains llama.cpp's position as the leading open-source inference engine with 95.5k GitHub stars
Why It Matters
Expands affordable local AI inference to more hardware, reducing dependency on expensive NVIDIA GPUs for Windows users.