HalalBench includes 1,043 images and 36,438 annotations across 14 languages?

HalalBench includes 1,043 images and 36,438 annotations across 14 languages

Top OCR engine docTR achieves only F1=0.193, with all engines scoring 0.000 on Japanese?

Top OCR engine docTR achieves only F1=0.193, with all engines scoring 0.000 on Japanese

Custom post-processing algorithm boosts F1 by 36%, validated via production scanner serving 20+ countries?

Custom post-processing algorithm boosts F1 by 36%, validated via production scanner serving 20+ countries

Research & Papers

HalalBench: New OCR benchmark exposes food label reading failures

arXiv cs.CV April 28, 2026

⚡Current OCR engines fail on Japanese ingredient labels with 0% accuracy.

Deep Dive

Hasan Arief's new paper introduces HalalBench, the first open multilingual benchmark designed specifically for OCR on food packaging ingredient extraction. The benchmark includes 1,043 images—50 real and 993 synthetic—with 36,438 annotations in COCO format, spanning 14 languages. It addresses unique challenges like curved surfaces, dense multilingual text, and sub-8pt fonts that existing document or scene-text benchmarks miss.

Evaluations of four OCR engines—docTR (F1=0.193), ML Kit (0.180), and EasyOCR (0.167)—show poor performance overall, with all failing completely on Japanese (F1=0.000). A clustering-based post-processing algorithm improved F1 by 36%. Results are validated through HalalLens, a production halal scanner used in over 20 countries. The dataset and code are released under open licenses, providing a critical resource for improving automated halal food verification and multilingual OCR systems.

Key Points

HalalBench includes 1,043 images and 36,438 annotations across 14 languages
Top OCR engine docTR achieves only F1=0.193, with all engines scoring 0.000 on Japanese
Custom post-processing algorithm boosts F1 by 36%, validated via production scanner serving 20+ countries

Why It Matters

First standardized benchmark for halal food verification OCR, revealing critical gaps in multilingual ingredient reading.

Read Original Article

HalalBench: New OCR benchmark exposes food label reading failures

Why It Matters

Related Articles

🚀 Stay Ahead in AI