Hierarchical LLM-Based Multi-Agent Framework with Prompt Optimization for Multi-Robot Task Planning
A new hierarchical multi-agent framework uses TextGrad-inspired updates to optimize prompts and plan complex tasks.
Researchers Tomoya Kawabe and Rin Takano from NEC Corporation have introduced a novel hierarchical multi-agent framework that significantly improves how teams of heterogeneous robots plan and execute tasks from natural language instructions. The system, detailed in a paper accepted to ICRA 2026, addresses a critical gap: traditional PDDL planners offer rigor but struggle with ambiguity, while LLMs can interpret instructions but often hallucinate infeasible actions. Their solution creates a two-tiered structure where an upper-layer agent decomposes high-level missions and assigns them to specialized lower-layer agents, which then formulate precise Planning Domain Definition Language (PDDL) problems for a classical solver to execute.
The framework's breakthrough is its integration of prompt optimization inspired by TextGrad, where the system analyzes planning failures and applies textual-gradient updates to refine each agent's instructions, thereby learning from mistakes. Additionally, it employs meta-prompts shared across agents within the same layer to accelerate learning in multi-agent settings. On the MAT-THOR benchmark, the planner achieved success rates of 0.95 on compound tasks, 0.84 on complex tasks, and 0.60 on vague tasks, outperforming the prior SOTA, LaMMA-P, by 2, 7, and 15 percentage points respectively. An ablation study quantified the contributions: the hierarchical structure added +59 percentage points, prompt optimization +37 points, and meta-prompt sharing +4 points to the overall success rate. This work marks a substantial step toward reliable, instruction-driven automation for logistics, manufacturing, and domestic robotics.
- Achieves up to 0.95 success rate on compound tasks in the MAT-THOR benchmark, beating prior SOTA LaMMA-P by up to 15 percentage points.
- Uses a TextGrad-inspired method to optimize agent prompts when plans fail, improving accuracy through textual-gradient updates.
- Ablation study shows the hierarchical structure contributes +59 percentage points to the overall success rate, the largest single factor.
Why It Matters
Enables more reliable, instruction-driven automation for robot teams in warehouses, factories, and homes, reducing planning failures.