llama.cpp b9750 adds Jinja call statement support for local LLMs
The popular open-source LLM runner now supports Jinja's call statement across all platforms.
The llama.cpp project, known for its efficient local LLM inference on consumer hardware, has shipped b9750, a minor but impactful release. The headline feature is the implementation of Jinja's call statement (via PR #24847), which allows more complex, reusable prompt templates. This is especially useful for developers building custom workflows or multi-turn interactions with local models. The update also includes cleanup like de-lambda simplifications and moving caller context inside function handler.
What sets this release apart is its broad platform support. Builds are available for macOS (Apple Silicon with optional KleidiAI acceleration, Intel x64, iOS XCFramework), Linux (x64/arm64/s390x CPUs, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Windows (x64/arm64 CPUs, CUDA 12/13, Vulkan, OpenVINO, SYCL, HIP for AMD GPUs), and Android (arm64 CPU). The project's 118k GitHub stars and 19.8k forks attest to its dominance in the local AI ecosystem. For users, this means more sophisticated prompt engineering without sacrificing performance.
- Implements Jinja call statement for advanced prompt templating (PR #24847).
- Builds available for macOS, Linux, Windows, Android, and iOS with GPU backends (CUDA, Vulkan, ROCm, OpenVINO, SYCL, HIP).
- Project has 118k stars and 19.8k forks on GitHub, a top local LLM runtime.
Why It Matters
llama.cpp remains the go-to for local LLM inference; this release enables more flexible and reusable prompt workflows.