Adds non-backtracking handler for Qwen3.5's regex pattern \p{L}\p{M}+ to prevent stack overflow?

Adds non-backtracking handler for Qwen3.5's regex pattern \p{L}\p{M}+ to prevent stack overflow

Includes new test vocabulary file and regression tests to maintain stability?

Includes new test vocabulary file and regression tests to maintain stability

Developer Tools

llama.cpp b9148 fixes Qwen3.5 tokenizer stack overflow

llama.cpp Releases May 14, 2026

⚡New non-backtracking regex handler prevents crashes on long inputs

Deep Dive

The latest llama.cpp release (b9148) addresses a critical tokenizer bug in Qwen3.5 models. The issue, reported as #21919, caused stack overflows when processing long text inputs containing Unicode letters and combining marks. The new `unicode_regex_split_custom_qwen35()` function in `src/unicode.cpp` implements a non-backtracking handler for the regex `[\p{L}\p{M}]+`, preventing the recursive stack exhaustion that occurred with std::regex.

The commit also adds regression tests: a dedicated test vocab file (`ggml-vocab-qwen35.gguf`), test input cases, and expected output. This ensures the fix remains stable across future updates. The change is built on top of a similar fix for Qwen2 (commit 0d049d6), adapted for Qwen3.5's specific regex pattern. With this update, users running Qwen3.5 models through llama.cpp on any platform (macOS, Linux, Windows, Android) will no longer encounter crashes when feeding in long prompts or documents with accented characters.

Key Points

Adds non-backtracking handler for Qwen3.5's regex pattern \p{L}\p{M}+ to prevent stack overflow
Fixes issue #21919 where long Unicode inputs caused std::regex stack exhaustion
Includes new test vocabulary file and regression tests to maintain stability

Why It Matters

Local LLM users get crash-free inference with Qwen3.5, especially on long or accented text.

Read Original Article

llama.cpp b9148 fixes Qwen3.5 tokenizer stack overflow

Why It Matters

Related Articles

🚀 Stay Ahead in AI