New voice recognition endpoints for speaker verification, identification, and embedding (SpeechBrain + ONNX)?

New voice recognition endpoints for speaker verification, identification, and embedding (SpeechBrain + ONNX)

11 new backends, Ollama API drop-in, video generation, and vLLM tensor-parallel distributed workers?

11 new backends, Ollama API drop-in, video generation, and vLLM tensor-parallel distributed workers

Developer Tools

LocalAI 4.2.0 adds voice, face recognition, and Ollama API support

LocalAI May 11, 2026

⚡Now with 11 new backends, face antispoofing, and video generation from stable-diffusion

Deep Dive

LocalAI 4.2.0 is a major release that transforms the open-source local AI platform into a multimodal powerhouse. The headline additions are voice and face recognition pipelines: new /v1/voice/* endpoints enable speaker verification, identification, embedding, and attribute analysis (age, gender, emotion) via SpeechBrain + ONNX. The face recognition system (InsightFace + ONNX) adds 1:1 and 1:N matching, detection, and antispoofing (liveness) to reject photo or video spoofs. Audio gets a diarization endpoint (/v1/audio/diarization) using sherpa-onnx + vibevoice.cpp to determine “who spoke when,” plus word-level timestamps for faster-whisper and client cancellation for Whisper via the ggml abort_callback.

The release also introduces full Ollama API compatibility—users can point any Ollama client at LocalAI by setting OLLAMA_HOST. Video generation is now supported via stable-diffusion.ggml with curated gallery entries for Wan 2.1 FLF2V 14B and Wan i2v 720p. The UI has been redesigned with a Nord palette, i18n across 5 languages, and admin-configurable branding. An interactive model editor with autocomplete and a universal model importer (across most backends) simplify configuration. Under the hood, 11 new backends debut (sglang, ik-llama-cpp, TurboQuant, sam.cpp, Kokoros, qwen3tts.cpp, tinygrad-multimodal, LocalVQE, vibevoice-cpp, insightface for liveness, voice-rec), vLLM reaches feature parity with llama.cpp and adds tensor-parallel distributed workers, and Distributed v2 is hardened with round-robin replicas and scoped upgrades.

Key Points

New voice recognition endpoints for speaker verification, identification, and embedding (SpeechBrain + ONNX)
Face recognition pipeline with liveness antispoofing and 1:1 / 1:N matching
11 new backends, Ollama API drop-in, video generation, and vLLM tensor-parallel distributed workers

Why It Matters

LocalAI now matches cloud AI capabilities entirely on-device, with privacy, no API costs, and multimodal support.

Read Original Article

LocalAI 4.2.0 adds voice, face recognition, and Ollama API support

Why It Matters

Related Articles

🚀 Stay Ahead in AI