Developer Tools

llama.cpp b9395 released with improved CLI help output

The popular local LLM inference engine gets a UX polish update.

Deep Dive

llama.cpp, the widely-used open-source C++ library for efficient inference of large language models on consumer hardware, has released its latest version: b9395. The update, tagged on GitHub by the maintainer team, brings a notable improvement to user experience: a cleaner and more informative help output for the command-line interface. This change, contributed by Hugging Face engineer Adrien Gallouët (PR #23805), makes it easier for developers and hobbyists to discover and configure the myriad options available in llama.cpp.

As with every release, b9395 maintains the project's hallmark of cross-platform compatibility. The release artifacts include builds for macOS (both Apple Silicon and Intel), Linux (x64/arm64 with CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (x64/arm64 with CPU, CUDA 12 & 13, Vulkan, HIP), and Android (arm64). The project also offers an iOS XCFramework bundle. While no performance or feature breakthroughs are advertised in this release, the consistent refinement of the toolchain underscores llama.cpp's maturity as a staple for running models like Llama, Mistral, and Gemma locally on personal computers.

Key Points
  • llama.cpp version b9395 released, focused on improving CLI help output (PR #23805 by Hugging Face contributor).
  • Builds available for macOS (Apple Silicon & Intel), Linux (x64/arm64, multiple GPU backends), Windows (CPU, CUDA 12/13, Vulkan, HIP), and Android arm64.
  • No new major features, but ongoing polish for one of the most popular open-source LLM inference engines (114k stars on GitHub).

Why It Matters

Cleaner help output makes local LLM experimentation more accessible for developers and power users.