llama.cpp b9357 fixes Vulkan on AMD UMA, expands platform support
New release patches Vulkan queue handling for AMD unified memory devices.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
llama.cpp, the widely adopted open-source C++ library for running large language models locally, has released version b9357. This update addresses a Vulkan backend issue on AMD UMA (Unified Memory Access) devices by avoiding preference for the transfer queue. This fix improves stability and performance on AMD systems where CPU and GPU share memory, such as recent AMD APUs and integrated graphics. The release is signed and verified, underscoring the project's commitment to security.
Notably, b9357 ships prebuilt binaries for an extensive list of platforms: macOS Apple Silicon (both standard and KleidiAI-optimized), macOS Intel, iOS XCFramework, Ubuntu x64 and arm64 (CPU, Vulkan, ROCm 7.2, OpenVINO), Windows x64 and arm64 (CPU, CUDA 12/13, Vulkan, HIP), and Android arm64. This broad support underscores llama.cpp's role as the leading solution for on-device LLM inference. With 113k stars and 18.8k forks, the project continues to evolve, enabling developers to deploy models like LLaMA, Mistral, and others on virtually any device.
- Vulkan fix prevents transfer queue priority on AMD UMA devices, improving compatibility
- Supports 15+ build targets including macOS, Windows, Linux, iOS, Android, and GPU backends (CUDA, ROCm, Vulkan, OpenVINO, SYCL)
- Project has 113k stars and 18.8k forks, reflecting strong community adoption
Why It Matters
Enables smoother local AI inference on heterogeneous hardware, especially AMD unified memory systems.