VueBuds: Visual Intelligence with Wireless Earbuds
Researchers built camera-equipped earbuds that use VLMs for real-time scene understanding, drawing under 5mW of power.
A research team from the University of Washington, led by Maruchi Kim, has developed VueBuds, a groundbreaking prototype that embeds cameras into standard wireless earbuds. The system modifies Sony WF-1000XM3 earbuds, fitting each with a low-resolution, monochrome camera that draws less than 5mW of power. Despite each camera's view being partially occluded by the wearer's face, the combined binocular perspective from both earbuds provides comprehensive forward visual coverage. The cameras stream visual data over Bluetooth to a paired host device, like a smartphone, where on-device vision language models (VLMs) process the feed for real-time intelligence.
This integration enables an end-to-end system for visual assistance tasks, including scene understanding, translation, visual reasoning, and text reading, all activated on-demand. In user studies with 90 participants across 17 visual question-answering tasks, VueBuds achieved a response quality equivalent to the commercially available Ray-Ban Meta smart glasses. The research, accepted for CHI 2026, demonstrates that the ubiquitous earbud form factor—not just smart glasses—can be a viable, low-power platform for deploying advanced visual AI. This work opens the door to bringing rapidly advancing VLM capabilities to one of the world's most common wearables without significant changes to user behavior or device size.
- Embeds cameras in Sony WF-1000XM3 earbuds, streaming data via Bluetooth for on-device VLM processing.
- Each monochrome camera draws under 5mW, enabling operation within strict earbud power limits.
- In studies with 90 users, it matched Ray-Ban Meta's performance on 17 visual QA tasks.
Why It Matters
It brings powerful visual AI assistance to a ubiquitous, discreet wearable, potentially making smart glasses' features available to everyone.