PanoAffordanceNet: Towards Holistic Affordance Grounding in 360{\deg} Indoor Environments
New framework tackles distorted panoramic views to let robots understand what objects are for in a full room.
A research team from Zhejiang University and other institutions has published PanoAffordanceNet, a novel AI framework designed to solve 'holistic affordance grounding' in 360-degree indoor spaces. Unlike current computer vision models that identify objects in narrow perspective views, this system aims to give embodied AI agents—like future home robots—a global understanding of a room. It identifies not just what objects are, but what actions they afford (e.g., a chair is 'sit-able,' a handle is 'grasp-able') across the entire panoramic scene. This task is uniquely challenging due to severe distortions in 360-degree equirectangular images and the difficulty of aligning sparse, scattered object data into a coherent spatial map.
To overcome these hurdles, PanoAffordanceNet introduces two key technical innovations: a Distortion-Aware Spectral Modulator (DASM) that performs latitude-dependent calibration to fix warped object shapes, and an Omni-Spherical Densification Head (OSDH) that reconstructs a continuous, topologically correct scene from initially sparse activations. The model is trained using a multi-level constraint system combining pixel-wise, distributional, and region-text contrastive objectives, which helps maintain semantic accuracy even with limited training data. Crucially, the team also constructed and will release '360-AGD,' the first high-quality panoramic dataset specifically for affordance grounding, providing a essential benchmark for future research. Extensive experiments show their framework significantly outperforms existing methods adapted to this new task.
- Introduces the novel task of Holistic Affordance Grounding in 360° spaces, moving beyond object-centric, perspective-view analysis.
- Features a Distortion-Aware Spectral Modulator (DASM) to correct panoramic image warping and an Omni-Spherical Densification Head (OSDH) for scene continuity.
- Includes the release of the first panoramic affordance grounding dataset, 360-AGD, establishing a new benchmark for embodied AI research.
Why It Matters
This is foundational tech for next-gen robots that need to understand and interact with complex human environments, not just recognize isolated objects.