Developer Tools

Show HN: Moonshine Open-Weights STT models – higher accuracy than WhisperLargev3

Open-weights speech recognition runs 100x faster than Whisper on-device with 6.65% word error rate.

Deep Dive

Moonshine AI has launched Moonshine Voice, an open-source, on-device speech recognition toolkit that challenges OpenAI's Whisper dominance with superior accuracy and dramatically lower latency. The framework, optimized for real-time voice applications, offers a complete solution for transcription, speaker diarization, and command recognition across 8 languages including English, Mandarin, and Arabic. Unlike cloud-dependent APIs, Moonshine runs entirely locally, eliminating privacy concerns and API costs while supporting platforms from wearables to Raspberry Pis.

Technical benchmarks reveal Moonshine's medium streaming model achieves 6.65% word error rate with just 245 million parameters, beating Whisper Large v3's 7.44% WER despite using 84% fewer parameters. The performance gap widens in latency: Moonshine processes audio in 107ms on a MacBook Pro versus Whisper's 11,286ms, making it 100x faster for live applications. The toolkit includes models ranging from a 34MB tiny version to the flagship medium model, all trained from scratch using proprietary research. This positions Moonshine as the premier choice for developers building responsive voice interfaces where privacy and speed are critical.

Key Points
  • Moonshine Medium Streaming model achieves 6.65% WER vs Whisper Large v3's 7.44% using 84% fewer parameters (245M vs 1.5B)
  • Processes audio 100x faster than Whisper (107ms vs 11,286ms on MacBook Pro) for real-time applications
  • Runs entirely on-device across 8 platforms including Raspberry Pi and wearables with no API keys or cloud dependency

Why It Matters

Enables private, low-latency voice interfaces for IoT, wearables, and real-time applications where Whisper's cloud dependency and slow processing are prohibitive.