SG-CoT: An Ambiguity-Aware Robotic Planning Framework using Scene Graph Representations
New AI system helps robots ask clarifying questions when instructions are ambiguous, improving success rates by 15%.
A team of researchers including Akshat Rana and Peeyush Agarwal has introduced SG-CoT (Scene Graph-Chain-of-Thought), a novel framework designed to solve a critical problem in robotics: ambiguous instructions. When a large language model (LLM) is asked to plan a task like "pick up the red cup," it might fail if there are multiple red cups in the scene. SG-CoT addresses this by first constructing a structured scene graph from the robot's observations, which captures objects, their attributes (like color), and their relationships to each other.
In the second stage, the LLM planner is equipped with retrieval functions to query specific, relevant portions of this scene graph. This grounds the AI's reasoning in the actual environment, preventing it from hallucinating or making incorrect assumptions. Crucially, the framework enables the robot to identify the source of ambiguity—such as multiple matching objects—and proactively ask a disambiguation question to a human or another robot, like "Which red cup, the one on the table or the one on the shelf?"
Extensive experimentation validates the approach, showing it consistently outperforms previous methods. The results include a minimum 10% improvement in question-answering accuracy related to the scene and significant boosts in task success rates: at least a 4% increase in single-agent environments and a more substantial 15% improvement in multi-agent settings where coordination is key. This demonstrates SG-CoT's effectiveness in creating more reliable and generalizable robotic planners that can operate in complex, real-world situations.
- SG-CoT uses scene graphs to ground LLM reasoning in visual observations, reducing planning errors.
- The system improves question-answering accuracy by at least 10% and task success rates by 4-15%.
- It enables robots to identify ambiguous instructions and ask targeted clarifying questions to resolve them.
Why It Matters
This brings us closer to reliable robots that can understand and operate in messy, ambiguous real-world environments without constant human intervention.