Audo-Sight: AI-driven Ambient Perception Across Edge-Cloud for Blind and Low Vision Users
The edge-cloud hybrid system beats GPT-5 for 62% of blind and low-vision participants in human evaluations.
A team of researchers including Jacob Bradshaw and Mohsen Amini Salehi has published a paper on Audo-Sight, a breakthrough AI system designed to help blind and low-vision (BLV) individuals understand their environment through conversational voice interaction. The system tackles the critical challenge of providing timely and accurate ambient perception by employing a distributed architecture of expert and generic AI agents across edge devices (like smartphones or wearables) and cloud servers. Its core innovation is a dynamic routing system that analyzes query urgency and context to send scene data to the most suitable processing pipeline.
For urgent queries requiring speed, Audo-Sight simultaneously leverages both edge and cloud. The edge generates a quick initial response, while the cloud works on a more detailed analysis. The novel Response Fusion Engine then seamlessly merges these outputs, ensuring users get both speed and accuracy. Systematic evaluations show dramatic performance gains: speech output is delivered around 80% faster for urgent tasks, and complete responses are generated approximately 50% faster across all tasks compared to a commercial cloud-only baseline.
Perhaps most compelling are the human evaluation results. In tests with BLV participants, Audo-Sight was the preferred choice over a leading model like GPT-5 for 62% of users, with another 23% finding both systems comparable. This demonstrates that a purpose-built, hybrid architecture can outperform even the most advanced general-purpose LLMs for specific, high-stakes accessibility applications where latency and reliability are paramount.
- Uses a hybrid edge-cloud architecture with specialized AI agents to process environmental queries, dynamically routing based on urgency and context.
- Introduces a Response Fusion Engine that merges fast edge responses with accurate cloud outputs, achieving 80% faster speech for urgent tasks.
- Human evaluations show 62% of BLV participants preferred Audo-Sight over GPT-5, highlighting its effectiveness for real-world accessibility.
Why It Matters
It demonstrates how specialized AI architectures can significantly outperform general models like GPT-5 for critical real-world applications, particularly accessibility.