Cohere launches an open-source voice model specifically for transcription
Cohere's new 2B-parameter voice model processes 525 audio minutes per minute and supports 14 languages.
Enterprise AI company Cohere has entered the speech recognition arena with the launch of Transcribe, its first open-source automatic speech recognition (ASR) model. Designed for accessibility, the model is relatively lightweight at 2 billion parameters, enabling it to run on consumer-grade GPUs for developers and businesses who prefer to self-host. It currently supports transcription across 14 major languages, including English, French, German, Spanish, Chinese, and Japanese. Cohere claims Transcribe outperforms competitors like Zoom Scribe v1 and IBM Granite 4.0, achieving a top average word error rate (WER) of 5.42 on the Hugging Face Open ASR leaderboard. In human evaluations, it reportedly won 61% of the time against other models on accuracy and coherence.
Beyond raw performance, Transcribe boasts impressive throughput, capable of processing 525 minutes of audio in just one minute—a high mark for its model class. While it leads in average performance, the model showed some weakness in specific languages like Portuguese, German, and Spanish. Cohere is making the model available for free through its API and will also host it on its managed inference platform, Model Vault. Strategically, the company plans to integrate Transcribe into its enterprise agent orchestration platform, North, signaling a move to enhance its AI agent offerings with multimodal capabilities. This launch comes as Cohere reports strong enterprise traction, with $240 million in annual recurring revenue and public market ambitions.
- Achieves leading 5.42 average Word Error Rate (WER), beating models like Zoom Scribe and IBM Granite on the Hugging Face Open ASR leaderboard.
- Processes 525 minutes of audio per minute and is a lightweight 2B-parameter model designed for self-hosting on consumer GPUs.
- Will be integrated into Cohere's North agent platform and is available for free via API, supporting 14 languages including English, Chinese, and Spanish.
Why It Matters
Provides a high-performance, open-source alternative for developers building transcription features, reducing reliance on closed APIs and lowering costs.