Beyond Descriptions: A Generative Scene2Audio Framework for Blind and Low-Vision Users to Experience Vista Landscapes
A new generative AI framework moves beyond simple descriptions to create rich, nonverbal audio experiences of scenic vistas.
A research team from institutions including the University of Toronto and the University of Sydney has published a paper, "Beyond Descriptions: A Generative Scene2Audio Framework for Blind and Low-Vision Users to Experience Vista Landscapes," accepted at CHI 2026. The work introduces Scene2Audio, a novel AI framework designed to generate immersive, nonverbal audio experiences from visual landscapes, specifically targeting distant scenic views (Vista spaces). Unlike current tools that rely solely on spoken descriptions, this system uses generative models guided by psychoacoustics to create soundscapes that convey the aesthetic and spatial qualities of a scene, such as the rustle of distant trees or the echo of a mountain range.
The team conducted two key studies to validate their approach. First, a controlled user study with 11 BLV participants found that combining Scene2Audio's generated sounds with traditional speech descriptions created a significantly better experience than speech alone. Participants reported the sound effects complemented the speech, making scenes easier to imagine and more enjoyable. Second, a week-long "in-the-wild" study deployed via a mobile app with 7 BLV users demonstrated the framework's practical potential for enhancing daily outdoor experiences. The research represents a shift from purely descriptive assistive technology towards addressing the aesthetic and emotional needs of BLV users, bridging a gap in sensory perception.
- The Scene2Audio framework uses generative AI and psychoacoustics to create immersive soundscapes from visual landscapes, moving beyond simple text descriptions.
- A user study with 11 BLV participants found the combination of Scene2Audio sounds and speech was better than speech alone for imagining scenes.
- A week-long mobile app trial with 7 users showed the framework's real-world potential for enhancing outdoor experiences for the BLV community.
Why It Matters
It transforms assistive tech from basic description to rich sensory experience, addressing a key gap in accessibility and quality of life.