Llama.cpp b9310 improves checkpointing for long chat sessions
New release fixes checkpoint creation, adds --checkpoint-min-step for efficient memory management
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Llama.cpp, the popular open-source C++ implementation for running LLMs locally (113k GitHub stars), has released version b9310 with significant improvements to checkpointing and context management. The headline fix addresses a long-standing issue where checkpoints were created periodically mid-prompt, causing unnecessary memory overhead. Now the system uses chat template analysis to extract message spans and identify the exact position of the latest user input, creating a single checkpoint right before that message. This drastically reduces redundant memory snapshots.
The release also introduces the --checkpoint-min-step flag, giving developers fine-grained control over minimum spacing between checkpoints for further optimization. Multimodal prompts are now properly handled when mapping text/template positions to server prompt tokens. Additionally, autoparser detection for message barriers and a bug fix for message span delimiters improve reliability. The update comes with prebuilt binaries for all major platforms including macOS (Apple Silicon and Intel), Linux (x64/arm64/s390x with Vulkan, ROCm, OpenVINO, SYCL support), Windows (CPU, CUDA 12/13, Vulkan, HIP), Android arm64, and iOS as an XCFramework.
- Fixes checkpoint creation to avoid periodic mid-prompt checkpoints by identifying the latest user message position
- Adds --checkpoint-min-step command-line flag to control minimum spacing between checkpoints
- Extracts message spans from chat templates and supports autoparser detection for message barriers
Why It Matters
Enables longer, more stable local LLM conversations by preventing memory bloat from excessive checkpoints.