b9050
New patch ensures GPU backends initialize properly on all platforms.
The latest llama.cpp release, b9050, is a targeted patch that addresses a critical oversight in backend initialization. The commit, authored by Adrien Gallouët of Hugging Face, adds the missing call to `ggml_backend_load_all()`, which is responsible for enumerating and loading all available GPU compute backends. Without this call, certain backends—such as CUDA, Vulkan, ROCm, or SYCL—might not be recognized at runtime, leading to fallback to CPU-only inference or unexpected errors.
This release continues llama.cpp's tradition of broad platform support. Pre-compiled binaries are available for macOS (Apple Silicon, Intel, KleidiAI), Linux (x64, arm64, s390x, with Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (x64, arm64, with CUDA 12/13, Vulkan, SYCL, HIP), Android (arm64), and openEuler (x86, aarch64). For users running local LLMs with custom setups, this is a low-risk, high-impact update that ensures smoother multi-backend operation.
- Fixes missing `ggml_backend_load_all()` call to properly initialize GPU backends
- Supports 20+ platform/backend combinations including CUDA 13, ROCm 7.2, and KleidiAI
- Pre-built binaries available for macOS, Linux, Windows, Android, and openEuler
Why It Matters
Ensures local LLM inference works reliably across diverse hardware without backend glitches.