b9064
Critical stability fix for local LLM deployment on CPU and GPU.
The llama.cpp project, a leading open‑source framework for running LLaMA‑family models locally, has shipped version b9064. This maintenance release addresses a specific bug in device state save and load functionality (issue #22805). The fix ensures that when users persist and restore model state—for example, after pausing inference or switching hardware contexts—the internal state correctly transfers across CPU, GPU, and accelerator backends.
Version b9064 is available across all major platforms and hardware configurations: macOS (Apple Silicon with and without KleidiAI optimizations, Intel x64, iOS XCFramework), Linux (Ubuntu x64/arm64/s390x with CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Android (arm64), Windows (x64 CPU and arm64 CPU, CUDA 12.4 and 13.1 DLLs, Vulkan, SYCL, HIP), and openEuler (x86 and aarch64 with ACL Graph). The release maintains full compatibility with the extensive ecosystem of LLM fine‑tunes and quantization methods supported by llama.cpp.
- Fixes device state save/load bug (#22805) affecting inference persistence
- Supports 30+ platform and backend combinations including CUDA, ROCm, Vulkan, and SYCL
- Maintains compatibility with all LLaMA model variants and quantization formats
Why It Matters
Smooth device state management is essential for production local LLM workloads, preventing data loss during interruptions.