b8855
The latest commit patches a critical tokenization crash for GLM-DSA models when using vocab_only mode.
The open-source project llama.cpp, maintained by ggml-org, has released a new update identified as commit b8855. This release is primarily a bug fix targeting a crash in the `llama-tokenize` function that occurred specifically when using GLM-DSA architecture models with the `vocab_only` parameter enabled. The issue, tracked as #22102 on GitHub, was promptly addressed following code review, with Georgi Gerganov (the project's founder) co-authoring the simplified fix. This demonstrates the project's responsive maintenance cycle for its core inference engine, which is crucial for running models like Meta's Llama family efficiently on consumer hardware.
The technical release is accompanied by a comprehensive set of 28 pre-built binary assets for developers across major operating systems and hardware accelerators. This includes builds for macOS on both Apple Silicon and Intel, multiple Windows configurations (CPU, CUDA 12.4, CUDA 13.1, Vulkan, SYCL), various Linux setups with CPU, Vulkan, ROCm 7.2, and OpenVINO backends, as well as Android arm64 and specialized builds for openEuler. By providing these binaries, the team significantly lowers the barrier to entry, allowing developers and researchers to immediately deploy the stable fix without needing to compile from source, thereby preventing crashes in applications that rely on tokenizing GLM-DSA model vocabularies in isolation.
- Fixes a crash in `llama-tokenize` for GLM-DSA models when using the `vocab_only` flag (Issue #22102).
- Includes pre-built binaries for 28 platform/backend combinations, from Windows CUDA to macOS Apple Silicon and Linux ROCm.
- Co-authored by project founder Georgi Gerganov, ensuring the fix is aligned with the core library's architecture.
Why It Matters
Maintains stability for developers using llama.cpp to run and test GLM-DSA models locally, preventing application crashes during vocabulary processing.