[P] I've trained my own OMR model (Optical Music Recognition)
A developer's custom DaViT-Transformer model beats Audiveris on rhythmic scores with a 42.8 quality score.
Developer Clqu Wu has open-sourced Clarity-OMR, a novel Optical Music Recognition (OMR) model that converts sheet music PDFs into structured MusicXML files. The system employs a sophisticated 4-stage pipeline: it first uses YOLO for staff detection, then processes each staff at 192px resolution through a DaViT-Base vision encoder paired with a custom Transformer decoder featuring Rotary Position Embeddings (RoPE). A key innovation is the use of a grammar-based Finite State Automaton (FSA) during a constrained beam search, which enforces musical rules like beat consistency and chord structure to ensure output validity. The model was trained using DoRA (Weight-Decomposed Low-Rank Adaptation) with rank-64 on all linear layers, a parameter-efficient fine-tuning method.
In benchmark tests against the established open-source tool Audiveris using the mir_eval library on 10 classical piano pieces, Clarity-OMR proved competitive. While its overall average quality score was slightly lower (42.8 vs. Audiveris's 44.0), it demonstrated significant superiority on cleaner, more rhythmic scores. For instance, it scored 69.5 versus 25.9 on a Bartók piece and 66.2 versus 33.9 on 'The Entertainer.' The developer notes weaknesses with notes improperly placed on staves but suggests the model outperforms Audiveris on cherry-picked scores. All code for inference, training, and model weights are available on GitHub and Hugging Face, providing a foundation for the community to improve polyphonic training data, grammar constraints, and synthetic rendering.
- Uses a 4-stage pipeline with YOLO, DaViT-Base encoder, custom Transformer decoder, and grammar FSA for valid MusicXML output.
- Achieved a 42.8 average quality score, beating Audiveris (44.0) on rhythmic scores like Bartók (69.5 vs 25.9).
- Fully open-sourced on GitHub and Hugging Face, enabling community development for better polyphonic data and grammar.
Why It Matters
Democratizes high-quality sheet music digitization, enabling musicians, archivists, and developers to build better music analysis and playback tools.