Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations
New T5-based model achieves 97.3% onset F1 on ASAP dataset...
A team from Karlsruhe Institute of Technology—Maximilian Wachter, Sebastian Murgul, and Michael Heizmann—has developed a novel deep learning approach for rhythm quantization in automatic music transcription (AMT). Their method, detailed in a paper accepted to the 2025 International Conference on SMART MULTIMEDIA (ICSM), uses a transformer model based on the T5 architecture to quantize MIDI performances into readable musical scores. The key innovation is using a priori beat information to align performance and score data within a unified framework, addressing a gap in beat-based quantization research.
The model achieved impressive results on the ASAP dataset: a 97.3% onset F1-score and 83.3% note value accuracy. It generalizes well across time signatures, including those not seen during training, and produces clean, readable output. The researchers employed data augmentations like transposition, note deletion, and performance-side time jitter to enhance robustness. Fine-tuning on instrument-specific datasets further improved performance by capturing characteristic rhythmic and melodic patterns. This work provides a robust framework for converting expressive MIDI performances into accurate notation, with potential applications in music education, transcription software, and AI-assisted composition.
- 97.3% onset F1-score and 83.3% note value accuracy on ASAP dataset
- T5-based transformer with beat annotations and MIDI tokenizer
- Generalizes to unseen time signatures; fine-tunable for specific instruments
Why It Matters
Enables accurate, automatic conversion of expressive MIDI performances into readable sheet music for musicians and educators.