Orli: New AI Model Detects Text Lines and Reading Order in One Pass
Trained on 196,691 pages across 10 writing systems, Orli achieves zero-shot reading order.
Traditional OCR pipelines for historical documents split layout analysis into separate line detection and reading-order steps, often relying on brittle hand-coded heuristics that fail with marginalia, multi-column layouts, or tables. Benjamin Kiessling (ALMAnaCH) introduces Orli (Ordered Regression of Lines), which reframes both sub-tasks as a single autoregressive image-to-sequence problem. Given a page image, Orli directly generates text-line baselines in reading order using a novel chord-frame parameterization that encodes position, orientation, and local geometry via perpendicular offsets. An iterative refinement head and local visual refiner produce the final curves.
Orli was trained on a heterogeneous corpus of 196,691 pages covering ten writing systems. It marginally exceeds the previous state of the art for cBAD line detection without dataset-specific training, and reaches near-perfect coverage and ordering on multiple reading-order benchmarks in a zero-shot setting. With limited fine-tuning, it adapts to specialized out-of-domain layouts. The source code and trained model weights are released under an open license, making it readily accessible for archival digitization and document analysis workflows.
- Unifies line detection and reading order into one autoregressive model, eliminating separate heuristic post-processing.
- Trained on 196,691 pages across 10 writing systems; exceeds prior SOTA on cBAD without dataset-specific training.
- Zero-shot near-perfect coverage on reading-order benchmarks; adapts to specialized layouts with minimal fine-tuning.
Why It Matters
Orli streamlines historical document OCR by solving layout analysis end-to-end, reducing error and manual tuning.