ReasonNavi: Human-Inspired Global Map Reasoning for Zero-Shot Embodied Navigation
New framework uses MLLMs to plan like humans, outperforming trained models in zero-shot navigation tasks.
Researchers from HKUST and Shanghai AI Lab developed ReasonNavi, a zero-shot embodied navigation framework. It couples Multimodal LLMs (like GPT-4V) with deterministic planners, converting top-down maps into a reasoning space. The MLLM selects waypoints based on instructions, sidestepping coordinate prediction weaknesses. This requires no MLLM fine-tuning and outperforms prior methods across three navigation tasks, offering a scalable and interpretable solution that improves with foundation model advancements.
Why It Matters
Enables robots to navigate complex environments intelligently without costly, task-specific training, accelerating real-world deployment.