Developer Tools

b8979

New CUDA kernel fusion speeds up SSM inference on NVIDIA GPUs...

Deep Dive

ggml-org's llama.cpp release b8979 (April 29) fuses SSM_CONV, ADD (bias), and SILU operations into a single CUDA kernel. The update is included in prebuilt binaries for macOS (Apple Silicon, Intel, iOS), Linux (CPU, Vulkan, ROCm, OpenVINO, SYCL), Android arm64, and Windows (CPU, CUDA 12/13, Vulkan, SYCL, HIP).

Key Points
  • Fuses SSM_CONV, ADD (bias), and SILU into a single CUDA kernel
  • Reduces memory bandwidth and kernel launch overhead for SSM models
  • Supports macOS, Linux, Android, Windows, and openEuler with prebuilt binaries

Why It Matters

Faster inference for state-space models on NVIDIA GPUs, with broad platform support for local AI deployment.