Voxmap-Studio: Open-source tool measures cost of speaker labeling
Tracks every edit and second spent labeling who spoke when.
Speaker diarization—labeling who speaks when in audio—is a costly and tedious task for building AI training datasets, yet existing annotation tools rarely quantify that cost. Enter voxmap-studio, an open-source, React-based tool built by Fumiaki Yamaguchi and integrated with the pyannote diarization ecosystem. It uses a fast stride-accelerated engine to auto-generate an initial diarization hypothesis, so the human annotator only needs to correct errors rather than draw every speaker turn from scratch. What sets it apart is its built-in cost instrumentation: the tool records every edit operation (typed insertions, deletions, substitutions) and the time spent, treating these metrics as first-class outputs for benchmarking different assistance strategies.
Export is gated on per-segment human confirmation, and voxmap-studio injects “phantom” attention checks—fake segments that must be verified—to prevent automated outputs from being mistakenly released as ground truth. In a preliminary study on nine AMI meeting audio files, unassisted manual annotation was both the costliest and least accurate. Auto-initialization shifted the annotator’s work from creating turn boundaries to correcting them, and a version that highlighted uncertain segments achieved the lowest overall cost in this small sample. The tool is fully open-source, enabling researchers to quantitatively compare how different forms of AI assistance affect annotation efficiency and quality.
- Built with React and pyannote; initializes diarization with a stride-accelerated engine to speed up annotation.
- Records fine-grained cost metrics: edit-operation counts and time spent per annotation session.
- Export protected by per-segment human confirmation and injected phantom attention checks to prevent unverified data release.
Why It Matters
Brings transparency to the hidden labor cost of building speaker diarization datasets, enabling better AI-assisted workflows.