Audio & Speech

Not that Groove: Zero-Shot Symbolic Music Editing

AI music editing goes symbolic: LLMs manipulate MIDI drum patterns using plain English.

Deep Dive

Li Zhang’s new paper, ‘Not that Groove: Zero-Shot Symbolic Music Editing,’ tackles the bottleneck of instruction-driven MIDI editing by reframing it as a structured reasoning problem. Rather than relying on scarce paired datasets, the author introduces a novel text-based ‘drumroll’ notation—a spatial, syntax-driven grid that represents drum mechanics. This representation allows off-the-shelf LLMs to apply complex edits to drum grooves using only zero-shot prompting, bypassing the need for any supervised training. The work also presents a comprehensive benchmark called ‘Not that Groove,’ comprising thousands of drum grooves paired with descriptive natural language instructions.

To evaluate the approach without costly human evaluation, Zhang built a scalable, domain-informed automated unit-testing framework that symbolically verifies whether an edited groove satisfies the user’s request. Experiments across eight state-of-the-art LLMs showed strong results: the top model achieved a 68% success rate on these unit tests. Importantly, listening tests confirmed that the programmatic unit tests align highly with subjective judgments from professional musicians. This establishes a data-efficient, robust foundation for controllable AI music production, giving producers granular control without needing expensive datasets or proprietary audio models.

Key Points
  • Zero-shot approach converts MIDI drum patterns into a text-based ‘drumroll’ grid for LLM reasoning, no training data needed.
  • Top-performing LLM succeeded on 68% of automated unit tests, with tests closely matching professional musician evaluations.
  • New benchmark includes thousands of drum grooves with paired natural language instructions for reproducible evaluation.

Why It Matters

Enables professional music producers to edit MIDI grooves with plain English, no costly datasets required.