Developer Tools

llama.cpp b9442 adds tokenizer support for Jina Chinese embeddings

llama.cpp now natively supports Jina embeddings for Chinese text processing.

Deep Dive

llama.cpp, the widely-used C++ implementation for running large language models locally, has released version b9442 with a key addition: tokenizer support for the jina-embeddings-v2-base-zh model. This model, developed by Jina AI, is designed for generating high-quality embeddings specifically for Chinese text. The new tokenizer merges a whitespace-based tokenizer with BERT-style WordPiece handling, and sets lowercase normalization to true by default. The pull request was authored by contributors Sigbjørn Skjæret and others, and merged into the main branch on May 31.

This update significantly broadens llama.cpp's multilingual capabilities, allowing developers and enterprises to run Chinese embedding models entirely on local hardware without relying on cloud APIs. The b9442 release includes pre-built binaries for a wide range of platforms: macOS (Apple Silicon and Intel), Linux (x86_64, ARM64, s390x), Windows (x86_64 and ARM64), and Android (ARM64). Backend support spans CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL, and CUDA 12/13, ensuring compatibility across consumer GPUs and specialized accelerators. For privacy-sensitive applications like retrieval-augmented generation (RAG) with Chinese documents, this enables fully offline embedding pipelines.

Key Points
  • Added tokenizer for jina-embeddings-v2-base-zh, a Chinese text embedding model
  • Tokenizer uses whitespace-based approach with BERT-style WordPiece and lowercase normalization
  • Release supports macOS, Linux, Windows, Android, and multiple GPU backends (CUDA, Vulkan, ROCm, OpenVINO)

Why It Matters

Enables local, private Chinese embedding generation in llama.cpp, expanding multilingual NLP pipelines without cloud dependencies.