Robotics

HTNav: A Hybrid Navigation Framework with Tiered Structure for Urban Aerial Vision-and-Language Navigation

arXiv cs.RO April 13, 2026

⚡New framework achieves state-of-the-art performance on CityNav benchmark for complex urban drone navigation.

Deep Dive

A research team led by Chengjie Fan has introduced HTNav, a novel hybrid navigation framework designed to solve critical challenges in urban aerial Vision-and-Language Navigation (VLN). Existing drone navigation AI struggles with generalization in unseen environments, long-range path planning, and understanding spatial continuity. HTNav addresses these by integrating Imitation Learning (IL) and Reinforcement Learning (RL) within a collaborative, tiered structure. This staged training mechanism ensures stable basic navigation while boosting the system's ability to explore complex urban settings.

The framework's core innovation is a two-tier decision-making process that separates macro-level path planning from fine-grained action control, allowing for more sophisticated navigation strategies. It also incorporates a map representation learning module to deepen the AI's understanding of spatial layouts. Tested on the CityNav benchmark, HTNav delivered state-of-the-art results across all difficulty levels and scene complexities. This performance leap marks a significant step toward reliable autonomous drones for real-world tasks like package delivery and infrastructure inspection in dense, unpredictable cities.

Key Points

Combines Imitation Learning and Reinforcement Learning in a hybrid, tiered architecture for stable and explorative navigation.
Introduces a map representation module to improve understanding of spatial continuity in open urban domains.
Achieved top performance on the CityNav benchmark, enhancing precision and robustness for complex aerial tasks.

Why It Matters

Enables more reliable autonomous drones for urban logistics and inspection, overcoming key AI navigation hurdles in real-world environments.

Read Original Article

HTNav: A Hybrid Navigation Framework with Tiered Structure for Urban Aerial Vision-and-Language Navigation

Why It Matters

Stay Ahead in AI