IndoorR2X: Indoor Robot-to-Everything Coordination with LLM-Driven Planning
New benchmark combines robot fleets with existing building cameras for 360-degree AI coordination.
A research team from Carnegie Mellon University and collaborating institutions has published IndoorR2X, a groundbreaking simulation framework that represents the first benchmark for Large Language Model-driven coordination between robot teams and existing building infrastructure. Unlike traditional robot-to-robot systems that suffer from partial observability, IndoorR2X creates a "Robot-to-Everything" network where mobile robots dynamically integrate data from static IoT sensors—like security cameras, motion detectors, and environmental monitors—to build a comprehensive, real-time semantic map of indoor environments. This fusion allows a small team of robots to understand spaces far beyond their immediate sensor range, dramatically reducing the exploration overhead typically required for tasks like search, delivery, or monitoring.
The core innovation lies in using LLMs like GPT-4 as the high-level planner. The system translates the unified sensor data into a natural language context for the LLM, which then generates step-by-step coordination strategies. For example, an LLM can instruct one robot to investigate a room while another retrieves an object, all while referencing fixed camera feeds to avoid obstacles. The framework includes configurable environments, sensor layouts, and task suites for systematic testing. Experiments showed that IoT-augmented world modeling improved multi-robot task efficiency by 30-40% and significantly increased reliability in complex, cluttered settings compared to robot-only teams.
The researchers also documented key insights and failure modes, providing a crucial roadmap for the field. They found that while LLMs excel at high-level strategy, their plans can sometimes lack geometric precision or fail under sensor noise. The IndoorR2X benchmark is now available for the research community to develop and test more robust coordination algorithms, pushing toward practical deployment in warehouses, hospitals, and smart buildings where infrastructure and robots must seamlessly collaborate.
- First benchmark for LLM-driven Robot-to-Everything (R2X) coordination, integrating mobile robots with static IoT sensors like cameras.
- Reduces redundant robot exploration by up to 40% by building a global semantic map from fused sensor data.
- Uses LLMs (e.g., GPT-4) as high-level planners to generate natural language coordination strategies for complex indoor tasks.
Why It Matters
Enables efficient, small robot teams to leverage existing building sensors for complex tasks in logistics, healthcare, and security.