LocoVLM: Grounding Vision and Language for Adapting Versatile Legged Locomotion Policies
Robots can finally understand and act on complex human instructions in real-time.
Researchers have developed LocoVLM, a new system that enables legged robots to adapt their locomotion in real-time using high-level reasoning from vision and language models. It achieves up to 87% instruction-following accuracy without needing to query cloud-based foundation models online. The method uses a pre-trained LLM to create a skill database and a vision-language model to ground environmental semantics, allowing for versatile, style-conditioned control based on verbal commands.
Why It Matters
This breakthrough moves robots beyond simple geometric navigation, enabling them to understand and respond to complex, real-world human instructions.