llama.cpp b9428 expands platform support for local LLM inference
New release fixes s390x builds and improves iOS multi-threading.
The open-source project llama.cpp, which enables local inference of large language models on consumer hardware, has tagged release b9428. This incremental update focuses on expanding platform compatibility and fixing build issues. Key changes include a fix for the s390x release job (IBM Z architecture) and enabling multi-threaded builds for iOS XCFramework, improving performance on Apple devices. The release also ships new UI assets and continues to support a vast array of backends: CPU, CUDA, Vulkan, ROCm, OpenVINO, SYCL, and more, across Linux, Windows, macOS, Android, and openEuler.
For developers running LLMs locally, b9428 ensures smoother builds on less common architectures like s390x, which is important for enterprise Linux environments. The multi-threaded iOS build means better performance for on-device models on iPhones and iPads. While not a major feature release, this update demonstrates the project's commitment to reliability and broad hardware support. With 114k stars, llama.cpp remains the go-to tool for running models like Llama, Mistral, and GPT-2 locally, and this release makes it even easier to deploy across diverse setups.
- Fixed s390x release job for IBM Z architecture compatibility.
- Enabled multi-threaded builds for iOS XCFramework, improving performance.
- Supports 23+ platform/target combinations including CPU, CUDA, Vulkan, and ROCm.
Why It Matters
Broader platform support means more developers can run LLMs efficiently on diverse hardware, from servers to iPhones.