Open Source

ibm-granite/granite-4.0-1b-speech · Hugging Face

A 1B-parameter model that supports 6 languages, runs faster, and adds keyword biasing for names.

Deep Dive

IBM has launched Granite-4.0-1B-Speech, a new compact and efficient speech-language model designed for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST). Hosted on Hugging Face, this 1-billion-parameter model represents a significant downsizing from the previous Granite-Speech-3.3-2B, halving the parameter count while expanding language support to include English, French, German, Spanish, Portuguese, and Japanese. The model was trained by modality-aligning the Granite-4.0-1B-Base model to speech using a collection of public, open-source corpora containing diverse audio inputs and text targets, including synthetic datasets tailored for Japanese ASR and keyword-biased tasks. This release focuses on making advanced speech AI more accessible and deployable on edge devices.

Technically, the model introduces several key improvements over its predecessors. It achieves higher transcription accuracy for English ASR and enables faster inference through enhanced encoder training and the implementation of speculative decoding—a technique that predicts multiple tokens ahead to speed up generation. A standout feature is the addition of keyword list biasing, which allows users to provide a list of specific terms (like names or acronyms) to significantly improve their recognition accuracy in transcriptions. By cutting the parameter count in half compared to the 2B model, IBM directly targets deployment on resource-constrained hardware, from smartphones to embedded systems, without sacrificing core multilingual capabilities. This move aligns with the industry trend towards smaller, more efficient models that can perform specialized tasks effectively at the edge.

Key Points
  • At 1B parameters, it's half the size of the previous 2B model for edge deployment
  • Adds multilingual support for 6 languages: English, French, German, Spanish, Portuguese, Japanese
  • Introduces keyword list biasing to significantly improve recognition of specific names and acronyms

Why It Matters

Enables accurate, multilingual speech recognition on smartphones and embedded devices, powering next-gen voice interfaces.