Developer Tools

b8979

llama.cpp Releases April 30, 2026

⚡New CUDA kernel fusion speeds up SSM inference on NVIDIA GPUs...

Deep Dive

ggml-org's llama.cpp release b8979 (April 29) fuses SSM_CONV, ADD (bias), and SILU operations into a single CUDA kernel. The update is included in prebuilt binaries for macOS (Apple Silicon, Intel, iOS), Linux (CPU, Vulkan, ROCm, OpenVINO, SYCL), Android arm64, and Windows (CPU, CUDA 12/13, Vulkan, SYCL, HIP).

Key Points

Fuses SSM_CONV, ADD (bias), and SILU into a single CUDA kernel
Reduces memory bandwidth and kernel launch overhead for SSM models
Supports macOS, Linux, Android, Windows, and openEuler with prebuilt binaries

Why It Matters

Faster inference for state-space models on NVIDIA GPUs, with broad platform support for local AI deployment.

Read Original Article

b8979

Why It Matters

Stay Ahead in AI