Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B
Run live AI that sees, hears, and speaks in multiple languages directly on a MacBook Pro.
A new open-source project called Parlor demonstrates that powerful, real-time multimodal AI is no longer confined to data centers. Built by developer fikrikarim using Google's lightweight Gemma 2B (E2B) model, the system can run live on consumer hardware like an Apple M3 Pro MacBook. It processes simultaneous audio and video input from a device's microphone and camera, then generates spoken responses in real-time, all without sending data to the cloud. This local processing capability is a breakthrough for privacy and latency, enabling truly interactive applications.
The core innovation is achieving this complex pipeline—vision recognition, language understanding, and speech synthesis—on a single, efficient model that fits on a laptop. While not suited for complex 'agentic' coding tasks, Gemma E2B excels at the conversational and recognition tasks needed for an AI companion. The model is multilingual, allowing users to point a camera at objects and discuss them, seamlessly switching languages. This functionality mirrors the futuristic demos from giants like OpenAI but makes it accessible today, hinting at a near future where such assistants run locally on phones, untethered from the internet.
- Runs Google's Gemma 2B (E2B) model locally on an M3 Pro MacBook for real-time audio/video processing.
- Enables multilingual conversational AI for applications like language learning through object recognition.
- Open-source project (Parlor) demonstrates a shift toward powerful, private, on-device AI assistants.
Why It Matters
It proves that advanced, private AI companions can run on personal devices today, reducing reliance on the cloud.