Developer Tools

llama.cpp b9414 adds DeepSeekOCR 2 with multi-tile dynamic resolution

Local AI now sees and reads: DeepSeekOCR 2 lands in llama.cpp for on-device document OCR.

Deep Dive

The open‑source llama.cpp project just shipped release b9414, bringing official support for DeepSeekOCR 2. This vision model excels at optical character recognition, especially on complex documents. The integration includes multi‑tile dynamic resolution, which automatically splits large images into smaller tiles and processes them at optimal resolutions — a critical feature for accurate OCR on scans, forms, or multi‑page PDFs.

Built on GitHub with a verified signature, the release compiles for virtually every platform: macOS (Apple Silicon and Intel), Linux (x86, ARM, s390x, with GPU backends like Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (CPU, ARM64, CUDA 12/13, Vulkan, HIP), and Android ARM64. Developers can now embed deep learning OCR directly into local applications without sending documents to external APIs, keeping data private and cutting latency. The update also drops redundant operations and adds a view separator token for better integration with LLM pipelines.

Key Points
  • Adds DeepSeekOCR 2 model support enabling high‑accuracy document OCR within llama.cpp
  • Multi‑tile dynamic resolution automatically optimizes image tiling for varied document sizes
  • Supports macOS, Linux, Windows, and Android across CPU, CUDA, Vulkan, ROCm, and more

Why It Matters

Local OCR with DeepSeekOCR 2 lets developers build private, low‑latency document reading into edge and desktop AI apps.