llama.cpp b9258 fixes DeepSeek-OCR, matches Pillow image processing
DeepSeek-OCR now fully parities Pillow reference, with SAM fixes and new test metrics.
llama.cpp b9258, released by ggml-org, introduces critical fixes and improvements to the DeepSeek-OCR module. The update brings full parity with the Pillow reference implementation, including a refactored padding system (replacing bool add_padding + pad_rounding enum with a single pad_style enum) and fixes to image-text reordering. SAM (Segment Anything Model) mask casting now only occurs when flash-attn is enabled, and the SAM build function has been extracted for reuse by deepseek-ocr-2. Additionally, regression tests for deepseek-ocr have been updated to use CER+chrF scores for more accurate ground-truth comparison, moving away from simple embedding-based metrics.
The release also includes changes to llama-chat to fix server/WebUI issues with the new media_markers_first() function, and adapted test-chat-template with new test cases for deepseek-ocr. Build support spans 25+ platform configurations, including macOS (Apple Silicon, Intel, iOS XCFramework), Linux (x64/arm64/s390x CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL), Android arm64, and Windows (CPU, CUDA 12/13, Vulkan, SYCL, HIP). This release is particularly relevant for developers deploying local multimodal models requiring robust image-to-text extraction with minimal dependencies.
- DeepSeek-OCR now achieves full image processing parity with Pillow reference implementation, fixing image-text reordering and padding issues.
- SAM mask casting is now conditional on flash-attn being enabled; SAM build function extracted for reuse by deepseek-ocr-2.
- Regression tests for DeepSeek-OCR now use CER+chrF scores for ground-truth comparison, improving accuracy over previous embedding-based methods.
Why It Matters
For local LLM users, b9258 brings reliable OCR and image understanding with broader platform support and better test validation.