LocoVLM: Robots Now Follow Verbal Commands with 87% Accuracy
Robots can finally understand and act on complex human instructions in real-time.
Researchers have developed LocoVLM, a new system that enables legged robots to adapt their locomotion in real-time using high-level reasoning from vision and language models. It achieves up to 87% instruction-following accuracy without needing to query cloud-based foundation models online. The method uses a pre-trained LLM to create a skill database and a vision-language model to ground environmental semantics, allowing for versatile, style-conditioned control based on verbal commands.
Why It Matters
This breakthrough moves robots beyond simple geometric navigation, enabling them to understand and respond to complex, real-world human instructions.