Developer Tools

b8118

The latest release merges parsers for two new specialized coding and reasoning models.

Deep Dive

The ggml-org team behind the popular llama.cpp project has released version b8118, a significant update focused on expanding model compatibility. The core technical change is the merging of the separate parsers for Qwen3-Coder (a specialized coding model from Alibaba) and Nemotron Nano 3 (a reasoning model from NVIDIA) into a single, unified variant using PEG (Parsing Expression Grammar) parsing. This architectural improvement simplifies the codebase and enhances the efficiency of loading and running these models. The release also adds a new test for JSON parameters, ensuring better configuration handling. As with all llama.cpp releases, it provides a wide array of pre-built binaries for easy deployment across macOS (Apple Silicon and Intel), Windows (with CPU, CUDA 12/13, Vulkan, SYCL, and HIP backends), Linux, iOS, and openEuler systems, lowering the barrier for developers to experiment with these new, powerful models locally.

Key Points
  • Merges Qwen3-Coder and Nemotron Nano 3 model parsers into a unified PEG-based system for cleaner code.
  • Adds a new JSON parameter test to improve configuration reliability and error handling.
  • Provides extensive pre-built binaries for Windows, macOS, Linux, and iOS, supporting multiple compute backends like CUDA and Vulkan.

Why It Matters

This update makes cutting-edge coding and reasoning models from Alibaba and NVIDIA easier to run locally, empowering developers.