Fixes checkpoint creation to avoid periodic mid-prompt checkpoints by identifying the latest user message position?

Fixes checkpoint creation to avoid periodic mid-prompt checkpoints by identifying the latest user message position

Adds --checkpoint-min-step command-line flag to control minimum spacing between checkpoints?

Adds --checkpoint-min-step command-line flag to control minimum spacing between checkpoints

Extracts message spans from chat templates and supports autoparser detection for message barriers?

Extracts message spans from chat templates and supports autoparser detection for message barriers

Developer Tools

Llama.cpp b9310 improves checkpointing for long chat sessions

llama.cpp Releases May 25, 2026

⚡New release fixes checkpoint creation, adds --checkpoint-min-step for efficient memory management

Deep Dive

Llama.cpp, the popular open-source C++ implementation for running LLMs locally (113k GitHub stars), has released version b9310 with significant improvements to checkpointing and context management. The headline fix addresses a long-standing issue where checkpoints were created periodically mid-prompt, causing unnecessary memory overhead. Now the system uses chat template analysis to extract message spans and identify the exact position of the latest user input, creating a single checkpoint right before that message. This drastically reduces redundant memory snapshots.

The release also introduces the --checkpoint-min-step flag, giving developers fine-grained control over minimum spacing between checkpoints for further optimization. Multimodal prompts are now properly handled when mapping text/template positions to server prompt tokens. Additionally, autoparser detection for message barriers and a bug fix for message span delimiters improve reliability. The update comes with prebuilt binaries for all major platforms including macOS (Apple Silicon and Intel), Linux (x64/arm64/s390x with Vulkan, ROCm, OpenVINO, SYCL support), Windows (CPU, CUDA 12/13, Vulkan, HIP), Android arm64, and iOS as an XCFramework.

Key Points

Fixes checkpoint creation to avoid periodic mid-prompt checkpoints by identifying the latest user message position
Adds --checkpoint-min-step command-line flag to control minimum spacing between checkpoints
Extracts message spans from chat templates and supports autoparser detection for message barriers

Why It Matters

Enables longer, more stable local LLM conversations by preventing memory bloat from excessive checkpoints.

Read Original Article

Llama.cpp b9310 improves checkpointing for long chat sessions

Why It Matters

Related Articles

🚀 Stay Ahead in AI