Developer Tools

llama.cpp b9496 fixes Gemma 4 floating-point exception bug

New patch for local LLM runner resolves a critical FPE in Gemma 4.

Deep Dive

llama.cpp, the widely used open-source C++ library for running large language models locally, has shipped patch release b9496. The update is a targeted bugfix: it resolves a unified floating-point exception (FPE) in the Gemma 4 model format (tracked as issue #24088). While the changelog is minimal—single-line at the top—the release is notable for its comprehensive multi-platform binary availability. Builds are provided for macOS (Apple Silicon with and without KleidiAI, Intel x64), Linux (x64, arm64, s390x, plus Vulkan, ROCm 7.2, OpenVINO, SYCL FP32—though SYCL is disabled), Android arm64, and Windows (x64 CPU/arm64 CPU, CUDA 12.4/13.3 DLLs, Vulkan, HIP, and SYCL disabled). openEuler builds are also included for x86 and aarch64 with 310p and 910b ACL Graph targets.

This release is part of the project's continuous maintenance, driven by its massive community (115k stars, 19.1k forks). The fix specifically addresses a floating-point exception in Gemma 4's unified mode, which could crash inference or produce incorrect outputs. For users running Gemma 4 locally—whether on consumer GPUs, Apple Silicon Macs, or cloud VMs—upgrading to b9496 is recommended for stability. The breadth of platform support (including niche architectures like s390x and openEuler) underscores llama.cpp's role as the default inference engine for AI enthusiasts and researchers who need to run models like Gemma, Llama, or Mistral without cloud dependencies.

Key Points
  • Fixes a unified floating-point exception (FPE) in Gemma 4 (issue #24088), preventing crashes during local inference.
  • Available for 20+ platform configurations including macOS (Apple Silicon/Intel), Linux (x64/arm64/Vulkan/ROCm), Windows (CPU/CUDA/Vulkan), and Android.
  • Part of the llama.cpp project (115k GitHub stars, 19.1k forks), a leading open-source LLM runtime for local deployment.

Why It Matters

Patch ensures stable Gemma 4 inference across diverse hardware, critical for local AI workflows.