llama.cpp b9354 adds MiniCPM5 tokenizer for local inference
Run MiniCPM5 on your own hardware with new tokenizer support in llama.cpp.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
The open-source llama.cpp project, known for efficient local inference of large language models, released version b9354 on May 27. The headline feature is tokenizer support for MiniCPM5, a small yet capable language model from OpenBMB. This integration allows developers to convert and run MiniCPM5 models locally using llama.cpp's optimized C++ backend. The tokenizer uses a BPE (Byte-Pair Encoding) approach with a hardcoded regex, consistent with other pre-tokenizers in the project. Co-authored by Zhang Tao from ModelBest, the update ensures smooth tokenization for MiniCPM5's vocabulary.
Beyond the tokenizer addition, the release provides pre-built binaries for a wide range of platforms: macOS (Apple Silicon with optional KleidiAI, Intel x64), iOS (XCFramework), Linux (CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32), Android ARM64, and Windows (CPU, ARM64, CUDA 12/13, Vulkan, HIP). This broad support means developers can deploy MiniCPM5 on everything from cloud servers to edge devices. The release also includes UI assets for the project's built-in web interface. Users can now leverage MiniCPM5's efficient architecture for tasks like text generation, summarization, and code completion directly on their own hardware, without relying on cloud APIs.
- Adds MiniCPM5 BPE tokenizer support via hardcoded regex pre-tokenizer
- Integrated through convert_hf_to_gguf_update.py for model conversion
- Pre-built binaries available for macOS, Linux, Windows, Android, and iOS platforms
Why It Matters
Enables developers to run MiniCPM5 locally, expanding options for efficient, private on-device language model inference.