Audio & Speech

Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations

arXiv eess.AS April 27, 2026

⚡New T5-based model achieves 97.3% onset F1 on ASAP dataset...

Deep Dive

A team from Karlsruhe Institute of Technology—Maximilian Wachter, Sebastian Murgul, and Michael Heizmann—has developed a novel deep learning approach for rhythm quantization in automatic music transcription (AMT). Their method, detailed in a paper accepted to the 2025 International Conference on SMART MULTIMEDIA (ICSM), uses a transformer model based on the T5 architecture to quantize MIDI performances into readable musical scores. The key innovation is using a priori beat information to align performance and score data within a unified framework, addressing a gap in beat-based quantization research.

The model achieved impressive results on the ASAP dataset: a 97.3% onset F1-score and 83.3% note value accuracy. It generalizes well across time signatures, including those not seen during training, and produces clean, readable output. The researchers employed data augmentations like transposition, note deletion, and performance-side time jitter to enhance robustness. Fine-tuning on instrument-specific datasets further improved performance by capturing characteristic rhythmic and melodic patterns. This work provides a robust framework for converting expressive MIDI performances into accurate notation, with potential applications in music education, transcription software, and AI-assisted composition.

Key Points

97.3% onset F1-score and 83.3% note value accuracy on ASAP dataset
T5-based transformer with beat annotations and MIDI tokenizer
Generalizes to unseen time signatures; fine-tunable for specific instruments

Why It Matters

Enables accurate, automatic conversion of expressive MIDI performances into readable sheet music for musicians and educators.

Read Original Article

Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations

Why It Matters

Stay Ahead in AI