llama.cpp b9219 drops HF cache migration, expands platform support
The popular LLM runtime cuts a dependency and ships prebuilt binaries for 30+ platforms...
The ggml-org/llama.cpp project has tagged version b9219, a maintenance release that removes the Hugging Face cache migration logic. This change, submitted by Hugging Face engineer Adrien Gallouët, eliminates legacy code that handled migrating cached model files from an older HF layout—simplifying the codebase and reducing potential bugs. The commit is verified with a GPG key and was merged on May 18.
The release is notable for its massive prebuilt binary rollout, covering 30+ platform/backend combinations: Apple Silicon and Intel Macs (with KleidiAI acceleration for ARM macOS), iOS as an XCFramework, Linux across x64, arm64, and s390x CPUs plus Vulkan, ROCm 7.2, OpenVINO, and SYCL (FP32/FP16) backends, Windows with CPU and CUDA 12/13, Vulkan, SYCL, HIP, Android arm64, and openEuler for Ascend NPUs. This ensures users on almost any hardware can drop in a compiled binary without building from source.
- Removes Hugging Face cache migration code (#23266), reducing maintenance overhead
- Prebuilt binaries for 30+ platform/backend combos including macOS, Linux, Windows, Android, iOS, and openEuler
- Includes KleidiAI acceleration for Apple Silicon macOS and separate CUDA 12/13 DLLs for Windows
Why It Matters
Simplifies llama.cpp for developers and expands ready-to-run access across diverse hardware, from laptops to cloud GPUs.