NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction
A new device mounted on smart glasses captures whispered speech by fusing audio and vibration signals.
A team of researchers led by Jun Rekimoto has unveiled NasoVoce, a novel hardware interface designed to make silent, always-available conversations with AI assistants a practical reality. The device is mounted on the nasal pads of standard smart glasses, positioning it unobtrusively close to the mouth. Its core innovation is the fusion of two complementary sensors: a traditional microphone that captures high-quality audio but is sensitive to noise, and a vibration sensor that is robust to environmental interference but yields lower signal quality. By combining these inputs, NasoVoce can reliably capture low-audibility speech, including whispers, which are often lost in noisy settings.
The system was rigorously evaluated using OpenAI's Whisper Large-v2 model for speech recognition, alongside established audio quality metrics like PESQ, STOI, and MUSHRA ratings. The results confirmed significant improvements in both recognition accuracy and speech quality over single-sensor approaches. This breakthrough addresses a key challenge in human-computer interaction: creating a wearable interface that balances vocabulary size, wearability, silence, and noise robustness. The research, accepted for ACM CHI 2026, demonstrates that the nasal bridge is an optimal location for capturing bone- and skin-conducted speech vibrations, paving the way for continuous, discreet interaction with AI agents without disturbing others.
- Fuses a microphone and vibration sensor on smart glasses' nose bridge for robust speech capture.
- Enables reliable recognition of whispered and low-volume speech using models like Whisper Large-v2.
- Evaluated with PESQ/STOI/MUSHRA metrics, showing improved quality and noise robustness for AI conversations.
Why It Matters
Enables private, continuous AI assistant interactions in public or noisy environments without audible speech.