[D] How should I fine-tune an ASR model for multilingual IPA transcription?
A developer is building a system to transcribe noisy, multilingual audio directly into the International Phonetic Alphabet.
A developer on Reddit is seeking advice to build a specialized Automatic Speech Recognition (ASR) model. The goal is to fine-tune a model to transcribe multilingual audio directly into the International Phonetic Alphabet (IPA), using a small dataset of 136 annotated audio files. The challenge involves handling varied speakers and background noise to create a system that outputs a precise phonetic representation of speech, regardless of language.
Why It Matters
Success could enable precise phonetic analysis for linguistics, language learning tech, and improving speech models for low-resource languages.