llama.cpp b9148 fixes Qwen3.5 tokenizer stack overflow
New non-backtracking regex handler prevents crashes on long inputs
The latest llama.cpp release (b9148) addresses a critical tokenizer bug in Qwen3.5 models. The issue, reported as #21919, caused stack overflows when processing long text inputs containing Unicode letters and combining marks. The new `unicode_regex_split_custom_qwen35()` function in `src/unicode.cpp` implements a non-backtracking handler for the regex `[\p{L}\p{M}]+`, preventing the recursive stack exhaustion that occurred with std::regex.
The commit also adds regression tests: a dedicated test vocab file (`ggml-vocab-qwen35.gguf`), test input cases, and expected output. This ensures the fix remains stable across future updates. The change is built on top of a similar fix for Qwen2 (commit 0d049d6), adapted for Qwen3.5's specific regex pattern. With this update, users running Qwen3.5 models through llama.cpp on any platform (macOS, Linux, Windows, Android) will no longer encounter crashes when feeding in long prompts or documents with accented characters.
- Adds non-backtracking handler for Qwen3.5's regex pattern \p{L}\p{M}+ to prevent stack overflow
- Fixes issue #21919 where long Unicode inputs caused std::regex stack exhaustion
- Includes new test vocabulary file and regression tests to maintain stability
Why It Matters
Local LLM users get crash-free inference with Qwen3.5, especially on long or accented text.