Say It My Way: Exploring Control in Conversational Visual Question Answering with Blind Users
Research with 11 blind users shows current VQA systems require 3x more conversational turns than necessary.
Researchers from the University of Maryland and University of Washington published "Say It My Way," analyzing 418 interactions between 11 blind users and conversational visual question answering (VQA) systems. They found interactions averaged 3 conversational turns (sometimes up to 21) and revealed critical system flaws: no verbosity controls, poor spatial/temporal estimation, and inaccessible camera guidance. The study demonstrates how prompt engineering helps users work around limitations and includes a new public dataset for better AI accessibility design.
Why It Matters
Highlights a major accessibility gap in AI-powered vision tools used by millions of blind and low-vision people daily.