Research & Papers

When Less Is More? Diagnosing ASR Predictions in Sardinian via Layer-Wise Decoding

Cutting layers from a speech AI model actually improves its accuracy.

Deep Dive

A new study reveals that intermediate layers in multilingual speech models like Wav2Vec2 encode more phonetically accurate information than the final output layer. By analyzing Campidanese Sardinian, a low-resource language, researchers found that truncating the top transformer layers improved Phoneme Error Rates (PER). The best performance came two layers before the final one, as intermediate predictions better preserved segmental identity and reduced key phonological errors.

Why It Matters

This challenges standard model design and offers a better diagnostic tool for improving speech AI, especially for low-resource languages.