b8762
The popular local AI framework now runs a 10B parameter model for speech transcription and translation.
The llama.cpp project, a leading C++ framework for running large language models locally, has integrated support for a powerful new multimodal AI in its latest update. Commit b8762 adds a dedicated projector type (PROJECTOR_TYPE_MERALION) for A*STAR's MERaLiON-2 model, which combines a Whisper large-v2 encoder for audio feature extraction with a Gemma2 decoder for language understanding. This allows the 3B and 10B parameter versions of MERaLiON-2 to be converted into the efficient GGUF format and run directly on consumer hardware, from Apple Silicon Macs to Windows PCs with CUDA.
This integration significantly expands the local AI toolkit. Developers and researchers can now perform complex audio-language tasks entirely offline, including multilingual speech transcription (supporting English, Chinese, Malay, and Tamil), translation, and spoken question-answering. The update is part of llama.cpp's multimodal framework, meaning the audio processing is tightly coupled with the model's reasoning capabilities. Pre-converted model files are available on Hugging Face, and the support is baked into the standard builds for all major platforms, lowering the barrier to experimenting with advanced audio AI.
- Adds support for A*STAR's MERaLiON-2 audio-language models (3B and 10B parameters) to the local inference framework.
- Enables offline tasks like multilingual speech transcription (EN/ZH/MS/TA), translation, and spoken QA via a new 'PROJECTOR_TYPE_MERALION'.
- Broad platform support includes macOS (Apple Silicon/Intel), Windows (CPU/CUDA/Vulkan), Linux, and iOS via standard llama.cpp builds.
Why It Matters
Puts state-of-the-art, multilingual audio understanding and generation capabilities directly on local devices, enhancing privacy and accessibility for developers.