Contextual Graph Representations for Task-Driven 3D Perception and Planning
New research tackles the 'state explosion' problem in robot task planning by learning contextual scene graphs.
A 2021 undergraduate thesis from the University of Toronto, authored by Christopher Agia, tackles a fundamental problem at the intersection of robotics and AI: making complex 3D scene understanding practical for real-world robot task planning. While modern computer vision can automatically generate detailed 3D scene graphs—hierarchical maps of objects and their relationships—these representations are often too dense and complex. They contain every detected object and relation, even though a specific task, like 'pick up the mug on the table,' only requires a small subset. This 'state explosion' overwhelms traditional task planners, making them slow and unsuitable for robots with limited computational power.
Agia's work does two key things to address this. First, it constructs a benchmark to empirically compare state-of-the-art classical planners, providing a needed tool for the research community. Second, and more innovatively, it explores using Graph Neural Networks (GNNs) to learn contextual, task-driven representations. Instead of processing the entire scene graph, a GNN can be trained to identify and encode only the objects and spatial relationships relevant to a given goal. This learned representation drastically shrinks the state space a planner must navigate, promising significantly faster and more efficient planning. The thesis lays groundwork for moving robots beyond simple navigation towards executing multi-step tasks in cluttered, dynamic 3D environments by making high-level reasoning computationally tractable.
- Identifies a critical flaw in standard 3D scene graphs: they create massive, inefficient state spaces that hinder real-time robot planning.
- Proposes using Graph Neural Networks (GNNs) to learn compact, task-specific representations, filtering out irrelevant scene data to accelerate planning.
- Constructs a new benchmark for empirically comparing classical planners, providing a standardized testbed for future research in embodied AI.
Why It Matters
This research is a crucial step towards enabling robots to perform complex, multi-step tasks in unstructured environments like homes and warehouses.