A Comprehensive Corpus of Biomechanically Constrained Piano Chords: Generation, Analysis, and Implications for Voicing and Psychoacoustics
A new 19.3 million chord corpus overturns music theory, showing skewness is 5.8x more important than spread for predicting roughness.
Researcher Mahesh Ramani has generated and analyzed a massive, open-source corpus of approximately 19.3 million piano chords, representing the largest known dataset of its kind. The chords are uniquely constrained by realistic biomechanics, modeling a two-handed player with a 1.5-octave reach per hand, making every entry theoretically playable. This dataset provides an unprecedented map of the practical harmonic space available to pianists and serves as a foundational tool for computational musicology, generative AI modeling, and psychoacoustic research.
Using this corpus, Ramani conducted a rigorous analysis that challenges a core tenet of traditional music pedagogy. The study modeled how a chord's 'voicing'—the vertical arrangement of its notes—relates to psychoacoustic targets like dissonance (perceived roughness). Contrary to the common emphasis on 'spread' (the width between notes), the data revealed that 'skewness' (the asymmetry of the note distribution) is a far stronger predictor. Specifically, skewness was approximately 5.8 times more effective than spread at predicting roughness, with statistical significance (ΔR² ≈ 6.75%, p ≈ 0.0008).
This finding suggests that the clarity of 'open voicings' is driven less by overall width and more by placing wider gaps in the lower register while allowing tighter clustering in the treble (negative skewness). The research demonstrates the corpus's utility for moving beyond anecdotal theory into data-driven discovery, with implications for AI music generation, voice-leading algorithms, and a more nuanced understanding of how humans perceive harmony.
- Created a corpus of ~19.3 million piano chords, all constrained by realistic human hand spans (1.5 octaves per hand).
- Data analysis showed chord 'skewness' is ~5.8x more predictive of dissonance than traditional 'spread' (β ≈ +0.145 vs. -0.025).
- Challenges music pedagogy, suggesting open voicing clarity comes from low-register gaps and high-register clustering, not just width.
Why It Matters
Provides a massive, realistic dataset to train better AI music models and challenges foundational music theory with data-driven evidence.