b8108
Latest release patches critical shape errors for Qwen3.5 Beta models and expands GPU support.
Deep Dive
The ggml-org team released Llama.cpp version b8108. This update primarily fixes the qwen3.5 beta/gate shapes to avoid extra reshapes, resolving a key bug (#19730). It also expands pre-built binaries, adding new Windows builds for CUDA 12.4 and CUDA 13.1. Users can now run Qwen3.5 models more efficiently and access updated GPU-accelerated binaries for Nvidia hardware on Windows, improving local AI inference performance.
Why It Matters
Ensures stability for popular Qwen3.5 models and broadens hardware accessibility for developers running local LLMs.