DeepSeek-OCR now achieves full image processing parity with Pillow reference implementation, fixing image-text reordering and padding issues?

DeepSeek-OCR now achieves full image processing parity with Pillow reference implementation, fixing image-text reordering and padding issues.

SAM mask casting is now conditional on flash-attn being enabled; SAM build function extracted for reuse by deepseek-ocr-2?

SAM mask casting is now conditional on flash-attn being enabled; SAM build function extracted for reuse by deepseek-ocr-2.

Regression tests for DeepSeek-OCR now use CER+chrF scores for ground-truth comparison, improving accuracy over previous embedding-based methods?

Regression tests for DeepSeek-OCR now use CER+chrF scores for ground-truth comparison, improving accuracy over previous embedding-based methods.

Developer Tools

llama.cpp b9258 fixes DeepSeek-OCR, matches Pillow image processing

llama.cpp Releases May 21, 2026

⚡DeepSeek-OCR now fully parities Pillow reference, with SAM fixes and new test metrics.

Deep Dive

llama.cpp b9258, released by ggml-org, introduces critical fixes and improvements to the DeepSeek-OCR module. The update brings full parity with the Pillow reference implementation, including a refactored padding system (replacing bool add_padding + pad_rounding enum with a single pad_style enum) and fixes to image-text reordering. SAM (Segment Anything Model) mask casting now only occurs when flash-attn is enabled, and the SAM build function has been extracted for reuse by deepseek-ocr-2. Additionally, regression tests for deepseek-ocr have been updated to use CER+chrF scores for more accurate ground-truth comparison, moving away from simple embedding-based metrics.

The release also includes changes to llama-chat to fix server/WebUI issues with the new media_markers_first() function, and adapted test-chat-template with new test cases for deepseek-ocr. Build support spans 25+ platform configurations, including macOS (Apple Silicon, Intel, iOS XCFramework), Linux (x64/arm64/s390x CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL), Android arm64, and Windows (CPU, CUDA 12/13, Vulkan, SYCL, HIP). This release is particularly relevant for developers deploying local multimodal models requiring robust image-to-text extraction with minimal dependencies.

Key Points

DeepSeek-OCR now achieves full image processing parity with Pillow reference implementation, fixing image-text reordering and padding issues.
SAM mask casting is now conditional on flash-attn being enabled; SAM build function extracted for reuse by deepseek-ocr-2.
Regression tests for DeepSeek-OCR now use CER+chrF scores for ground-truth comparison, improving accuracy over previous embedding-based methods.

Why It Matters

For local LLM users, b9258 brings reliable OCR and image understanding with broader platform support and better test validation.

Read Original Article

llama.cpp b9258 fixes DeepSeek-OCR, matches Pillow image processing

Why It Matters

Related Articles

🚀 Stay Ahead in AI