Fixes issue #22974 by casting half-precision intermediate results to float before addition?

Fixes issue #22974 by casting half-precision intermediate results to float before addition

Avoids ambiguity when the addition operator receives two half (FP16) inputs?

Avoids ambiguity when the addition operator receives two half (FP16) inputs

Supports 20+ binary variants across macOS, iOS, Linux, Android, Windows, and openEuler?

Supports 20+ binary variants across macOS, iOS, Linux, Android, Windows, and openEuler

Developer Tools

llama.cpp b9143 fixes half-precision ambiguity with float casting

llama.cpp Releases May 14, 2026

⚡Critical fix for half+half operator ambiguity in model inference

Deep Dive

The open-source llama.cpp project has rolled out release b9143, a minor but crucial update that fixes a long-standing precision issue in tensor operations. The core fix addresses issue #22974 by casting intermediate results to float before performing addition, then casting the final result back to the destination type. This prevents ambiguity when the operator receives two half-precision (FP16) inputs, which could lead to incorrect results during model inference—especially on GPUs or accelerators that natively support half-precision arithmetic.

The release is accompanied by pre-compiled binaries for a wide range of platforms: macOS Apple Silicon (both with and without KleidiAI acceleration), iOS, Linux (x64, arm64, s390x with Vulkan or ROCm), Android (arm64), Windows (CPU, CUDA 12/13, Vulkan, SYCL, HIP), and openEuler (x86 and aarch64 with ACL Graph). This broad support ensures that developers running local LLMs on diverse hardware—from MacBooks to enterprise servers—can quickly apply the fix and maintain numerical stability in their workflows. While not a headline-grabbing feature release, b9143 is a reliability improvement that prevents silent errors in half-precision pipelines.

Key Points

Fixes issue #22974 by casting half-precision intermediate results to float before addition
Avoids ambiguity when the addition operator receives two half (FP16) inputs
Supports 20+ binary variants across macOS, iOS, Linux, Android, Windows, and openEuler

Why It Matters

Ensures stable LLM inference on half-precision hardware, preventing silent arithmetic errors in production.

Read Original Article

llama.cpp b9143 fixes half-precision ambiguity with float casting

Why It Matters

Related Articles

🚀 Stay Ahead in AI