Server now sends periodic SSE pings to prevent connection timeouts during long inference streams?

Server now sends periodic SSE pings to prevent connection timeouts during long inference streams.

Release b9478 supports 20+ build targets across macOS, Linux, Windows, Android, iOS, and openEuler?

Release b9478 supports 20+ build targets across macOS, Linux, Windows, Android, iOS, and openEuler.

KleidiAI and SYCL FP32 builds remain disabled; CUDA 13 DLLs are included for Windows NVIDIA users?

KleidiAI and SYCL FP32 builds remain disabled; CUDA 13 DLLs are included for Windows NVIDIA users.

Developer Tools

llama.cpp b9478 adds SSE ping interval for stable server connections

llama.cpp Releases June 02, 2026

⚡Local LLM runner llama.cpp ships server update to prevent dropped streams.

Deep Dive

Llama.cpp, the high‑performance C++ library for running large language models locally, tagged release b9478 on June 2. The headlining change is a new SSE (Server‑Sent Events) ping interval feature added to the embedded server (#24013). SSE is the protocol used by many LLM backends to stream token‑by‑token responses to clients. Without a keep‑alive mechanism, long‑running streaming connections can be closed by load balancers or firewalls, cutting off responses mid‑generation. This update introduces a configurable ping interval that sends periodic heartbeats, keeping the connection alive through extended inference sessions.

The release also continues llama.cpp’s tradition of broad platform support. Pre‑built binaries are now available for macOS (Apple Silicon arm64 and Intel x64, plus iOS XCFramework), Linux (x86, arm64, s390x, with Vulkan, ROCm 7.2, OpenVINO, and SYCL backends), Windows (CPU, CUDA 12.4 and 13.3 DLLs, Vulkan, HIP), Android (arm64), and openEuler with ACL Graph optimizations. Notably, the KleidiAI‑enabled macOS variant remains disabled in this release, as does the SYCL FP32 and openEuler 310p builds. Developers can download the appropriate archive from the release assets to upgrade their local LLM servers.

Key Points

Server now sends periodic SSE pings to prevent connection timeouts during long inference streams.
Release b9478 supports 20+ build targets across macOS, Linux, Windows, Android, iOS, and openEuler.
KleidiAI and SYCL FP32 builds remain disabled; CUDA 13 DLLs are included for Windows NVIDIA users.

Why It Matters

Local LLM deployments gain production‑ready streaming reliability, essential for chatbots, copilots, and real‑time agents.

Read Original Article

llama.cpp b9478 adds SSE ping interval for stable server connections

Why It Matters

Related Articles

🚀 Stay Ahead in AI