Robotics

Dual-LLM system achieves 86% success rate in robotic task planning

Two large language models work together to interpret complex spatial commands.

Deep Dive

A team of researchers led by Karolina Źróbek has developed a hierarchical language-driven framework for robotic task and motion planning that leverages two distinct LLM modules. The high-level planning agent uses a ReAct-style prompting approach to interpret natural language commands and generate action sequences (e.g., pick, place, release), interacting with tools for object perception and manipulation. For precise spatial placement—like 'place the mug next to the plate'—a separate sub-prompting module handles 3D reasoning based on object geometry and scene layout.

The system integrates YOLOX-GDRNet for object detection and 6-DoF pose estimation, feeding data to a motion execution stub. Evaluated across 24 diverse test scenarios—from simple spatial commands to high-level instructions and even infeasible requests—the framework achieved an 86% overall task success rate. While still at the research stage, this dual-LLM approach significantly improves natural human-robot interaction, enabling service and assistance robots to understand nuanced commands with high accuracy. The work is published on arXiv (arXiv:2605.08330).

Key Points
  • Uses separate LLM modules for high-level planning (ReAct-style) and low-level 3D spatial reasoning.
  • Integrates YOLOX-GDRNet for object detection and 6-DoF pose estimation to inform robot actions.
  • Achieved 86% overall task success rate across 24 scenarios, including complex spatial and infeasible commands.

Why It Matters

Brings natural language control to service robots, enabling intuitive human-robot collaboration in homes and workplaces.