llama.cpp b9391 adds API key file support and expands platform builds
New release of the popular LLM inference engine adds secure API key handling
ggml-org has released llama.cpp version b9391, the latest update to the widely-used open-source C++ implementation for running large language models locally. The marquee feature in this release is the addition of the `LLAMA_ARG_API_KEY_FILE` environment variable, which provides a more secure way to specify the API key file path via the `--api-key-file` command-line option. This is particularly useful for production deployments where environment variables are preferred over hardcoded arguments or interactive prompts.
The release also demonstrates extensive cross-platform support with prebuilt binaries for nearly every major combination of architecture and accelerator. For macOS, builds are available for both Apple Silicon (arm64) and Intel (x64), with a separate Apple Silicon build featuring KleidiAI acceleration. Linux users get CPU-only builds for x64, arm64, and even s390x, plus GPU-accelerated versions with Vulkan, ROCm 7.2, and OpenVINO. Windows users can choose from CPU builds (x64 and arm64), CUDA 12 and CUDA 13 versions, Vulkan, and HIP. Mobile developers are covered with Android arm64 and iOS XCFramework. The project has amassed over 114,000 GitHub stars and 18,900 forks, reflecting its status as a cornerstone of the local AI inference ecosystem.
- New `LLAMA_ARG_API_KEY_FILE` environment variable simplifies API key management for production deployments
- Adds KleidiAI-accelerated build for macOS Apple Silicon, improving inference performance
- Expanded platform support includes Linux s390x, Android arm64, iOS XCFramework, and Windows CUDA 13
Why It Matters
Simplifies API key management for self-hosted LLM apps, enabling more secure and flexible production deployments.