DharmaOCR: Open-Source Specialized SLM (3B) + Cost–Performance Benchmark against LLMs and other open-sourced models [R]
A 3B-parameter model outperforms giants with 0.911 accuracy and 87.6% fewer failures.
Dharma-AI has open-sourced DharmaOCR, a specialized small language model (SLM) for optical character recognition (OCR), now available on Hugging Face. Fine-tuned from open-source SLMs with 3B and 7B parameters using supervised fine-tuning (SFT) and direct preference optimization (DPO), it outperforms major LLMs like GPT-5.4, Gemini 3.1 Pro, and Claude Opus 4.6, as well as Google Document AI and open-source alternatives like OlmOCR, Deepseek-OCR, GLMOCR, and Qwen3. The 7B model achieved a score of 0.925, while the 3B model scored 0.911, both surpassing the competition. A key innovation: using the model's own degenerate outputs as rejected examples during DPO training reduced the failure rate by 87.6%. Additionally, AWQ quantization cuts per-page inference cost by ~22% with negligible performance impact, making it cost-effective for high-volume deployment. All models, datasets, and a full methodology paper are publicly available, offering a transparent, high-performance alternative to proprietary OCR services.
- DharmaOCR (3B) scores 0.911, beating GPT-5.4, Gemini 3.1 Pro, and Claude Opus 4.6 in OCR benchmarks.
- DPO training with degenerate self-outputs reduced failure rate by 87.6%.
- AWQ quantization cuts per-page inference cost ~22% with minimal accuracy loss.
Why It Matters
Open-source 3B model outperforms top-tier LLMs, cutting costs and democratizing high-quality OCR for developers.