b8967
NVIDIA Blackwell GPUs get native NVFP4 support in llama.cpp for faster inference.
The llama.cpp team released version b8967, featuring a repost of PR #21896 that adds Blackwell native NVFP4 (4-bit floating point) support. This is a significant update for users running AI models on NVIDIA's latest Blackwell architecture GPUs, as it allows for more efficient memory usage and faster inference by leveraging the hardware's native FP4 capabilities. The release is signed with a verified GPG key for security.
The b8967 build includes extensive platform support: macOS (Apple Silicon arm64 with optional KleidiAI, Intel x64), Linux (x64, arm64, s390x CPUs; Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Windows (x64/arm64 CPUs; CUDA 12/13, Vulkan, SYCL, HIP), Android arm64, iOS XCFramework, and openEuler (x86 310p, x86 910b with ACL Graph, aarch64 310p, aarch64 910b with ACL Graph). This broad compatibility ensures users across different environments can utilize the new Blackwell support.
- Adds Blackwell native NVFP4 support via PR #21896 for NVIDIA's latest GPU architecture.
- Includes 30+ pre-built binaries across macOS, Linux, Windows, Android, iOS, and openEuler.
- Supports multiple backends: CUDA 12/13, Vulkan, ROCm 7.2, OpenVINO, SYCL, and HIP.
Why It Matters
Enables faster AI inference on NVIDIA Blackwell GPUs with native FP4 support.