llama.cpp b9478 adds SSE ping interval for stable server connections
Local LLM runner llama.cpp ships server update to prevent dropped streams.
Llama.cpp, the high‑performance C++ library for running large language models locally, tagged release b9478 on June 2. The headlining change is a new SSE (Server‑Sent Events) ping interval feature added to the embedded server (#24013). SSE is the protocol used by many LLM backends to stream token‑by‑token responses to clients. Without a keep‑alive mechanism, long‑running streaming connections can be closed by load balancers or firewalls, cutting off responses mid‑generation. This update introduces a configurable ping interval that sends periodic heartbeats, keeping the connection alive through extended inference sessions.
The release also continues llama.cpp’s tradition of broad platform support. Pre‑built binaries are now available for macOS (Apple Silicon arm64 and Intel x64, plus iOS XCFramework), Linux (x86, arm64, s390x, with Vulkan, ROCm 7.2, OpenVINO, and SYCL backends), Windows (CPU, CUDA 12.4 and 13.3 DLLs, Vulkan, HIP), Android (arm64), and openEuler with ACL Graph optimizations. Notably, the KleidiAI‑enabled macOS variant remains disabled in this release, as does the SYCL FP32 and openEuler 310p builds. Developers can download the appropriate archive from the release assets to upgrade their local LLM servers.
- Server now sends periodic SSE pings to prevent connection timeouts during long inference streams.
- Release b9478 supports 20+ build targets across macOS, Linux, Windows, Android, iOS, and openEuler.
- KleidiAI and SYCL FP32 builds remain disabled; CUDA 13 DLLs are included for Windows NVIDIA users.
Why It Matters
Local LLM deployments gain production‑ready streaming reliability, essential for chatbots, copilots, and real‑time agents.