Developer Tools

b8526

llama.cpp Releases March 26, 2026

⚡The latest release brings a new 2B-parameter coding model and expands compatibility to iOS, Windows, and Linux variants.

Deep Dive

The open-source project llama.cpp, maintained by ggml-org, has released a significant new version tagged b8526. This update centers on the integration of CodeFuse-AI's F2LLM-v2 model, a specialized 2-billion parameter language model fine-tuned for code generation and understanding. By adding this model to its supported list, llama.cpp provides developers with a new, efficient option for local code completion, explanation, and generation tasks that can run on consumer hardware.

The technical highlight of b8526 is its dramatically expanded cross-platform and hardware backend support. The release now provides pre-built binaries and support for macOS on both Apple Silicon (arm64) and Intel (x64) architectures, and notably introduces an iOS XCFramework, enabling on-device AI for mobile applications. For Linux, it covers standard Ubuntu x64 builds for CPU, Vulkan, and ROCm 7.2, plus specialized builds for s390x and OpenVINO. Windows support is comprehensive, spanning x64 and arm64 CPU builds, CUDA 12 and 13 for NVIDIA GPUs, Vulkan, SYCL, and HIP for AMD GPUs. The update also includes builds for the openEuler OS, targeting Huawei's Ascend AI processors (310p, 910b) with ACL Graph, showcasing the project's commitment to diverse hardware ecosystems.

This release underscores the ongoing evolution of the local AI inference ecosystem. By bundling support for a capable, compact coding model with an extensive matrix of deployment targets, llama.cpp b8526 lowers the barrier for integrating specialized AI into applications across desktop, server, edge, and mobile environments. It represents a move towards a more modular and portable AI stack, where developers can choose a model for a specific task and deploy it almost anywhere without being locked into a single cloud provider or hardware vendor.

Key Points

Adds official support for CodeFuse-AI's F2LLM-v2, a 2B-parameter model specialized for code tasks.
Expands platform support to include iOS (via XCFramework), Windows (CUDA 12/13, Vulkan, SYCL), and openEuler for Ascend chips.
Provides pre-built binaries for macOS Apple Silicon/Intel, multiple Linux backends (CPU, Vulkan, ROCm), enabling wider local deployment.

Why It Matters

Enables developers to run a capable code-generation AI locally on phones, PCs, and servers, reducing reliance on cloud APIs and costs.

Read Original Article

b8526

Why It Matters

Stay Ahead in AI