llama.cpp b9163 fixes reasoning-budget deep-copy bug for local LLM
Deep-copy issue patched in llama.cpp release b9163 for accurate token budgeting.
The llama.cpp project, a widely used open‑source C/C++ implementation for running large language models locally, released version b9163 on May 15. The headline fix resolves a bug in the "reasoning-budget" feature: when a clone was created, it did not perform a deep copy of the internal budget state. This could lead to shared mutable state and incorrect token allocation during inference, especially in multi‑threaded or agentic workflows that rely on precise budget tracking.
Release b9163 is available for major platforms: macOS (Apple Silicon and Intel), Linux (x64, arm64, s390x), Windows (x64, arm64), Android (arm64), iOS, and specialized hardware backends (CUDA, Vulkan, ROCm, OpenVINO, SYCL, HIP). The fix was committed with a verified signature by github-actions, ensuring code integrity. For developers and researchers using llama.cpp to run models like Llama, Mistral, or Phi locally, this update directly improves the reliability of controlling inference time and resource usage via reasoning budgets.
- Fixes deep-copy issue in reasoning-budget clone operations to prevent shared mutable state.
- Supports 10+ platforms including macOS, Linux, Windows, Android, iOS, and GPU backends.
- Released on May 15 with verified GitHub signature for security.
Why It Matters
Ensures accurate token budget tracking in local LLM inference, critical for agentic and production AI workflows.