Fulloch V2 runs fully local AI voice assistant on 16GB VRAM
Qwen3.5-9B powers real-time voice control with barge-in and semantic note search.
Fulloch V2, the latest iteration of a local voice assistant from developer liampetti, pushes the boundaries of on-device AI by running entirely on a single 16GB VRAM GPU (tested on an RTX 5060 Ti). The software stack combines three Qwen models: Qwen3.5-9B in GGUF Q5_K_M quantization for reasoning and response generation, Qwen3-1.7B for automatic speech recognition (ASR), and Qwen3-1.7B for text-to-speech (TTS). Key technical features include real-time acoustic barge-in (allowing users to interrupt the assistant) and follow-up capabilities that enable natural multi-turn conversations. The system is optimized for speedy responses while maintaining full privacy, as no data leaves the local machine.
Beyond basic voice control, Fulloch V2 now offers agentic long-term memory and deep integration with Obsidian (and other markdown note systems). Users can command the assistant to read, write, or append notes via voice, with a safety constraint that it never deletes or modifies existing content. A semantic voice search function, powered by the bge embedding model, lets users find notes by meaning rather than exact keywords. The assistant also connects to Home Assistant for smart home control. Additional flexibility includes a bash/bat script to create custom TTS voices and a config file that accepts any word as a wakeword without needing separate wakeword models. The project is open-source on GitHub under the MIT license.
- Runs entirely on a 16GB VRAM GPU using Qwen3.5-9B GGUF Q5_K_M, Qwen3-1.7B ASR, and Qwen3-1.7B TTS.
- Supports acoustic barge-in, multi-turn follow-up, and agentic long-term memory for natural conversations.
- Integrates with Obsidian to read, write, and append notes via voice, with semantic search using bge embeddings.
Why It Matters
Enables fully private, low-latency voice assistant for smart home and personal knowledge management.