SAGE drone system finds objects 13.7x faster using language commands
CLIP-powered drone explores unknown indoor spaces 13.7x faster than prior methods
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Researchers Nitin Vegesna and Avideh Zakhor have unveiled SAGE (Semantic-Aware Guided Exploration), a drone system that combines volumetric mapping with open-vocabulary object detection. Building on the FALCON explorer, SAGE integrates CLIP (Contrastive Language-Image Pre-training) through four components: object-centric embedding storage, a temporal cache for recent observations along free-unknown boundaries, object frontiers for high-similarity detections, and a unified semantic-geometric cost function. This design ensures that semantic cues reprioritize exploration frontiers without sacrificing total coverage—a key improvement over prior methods that either ignore semantics or over-prioritize them, leaving large areas unmapped.
In Matterport3D-based simulations, SAGE outperformed both FALCON and a semantic-only ablation in object discovery across map-query pairs. Compared to FTU (Finding Things in the Unknown), SAGE completed exploration 9.0 to 25.9 times faster across nine shared pairs, with a mean speedup of 13.7× and substantially higher volumetric throughput. The system was also validated in five real-world flights using a Modal AI Starling 2 quadrotor with onboard sensing and planning, offloading CLIP inference to a ground station. While FALCON alone achieved faster exploration and shorter trajectories, SAGE excelled at actually finding the requested objects, demonstrating that semantic guidance can be leveraged without catastrophic coverage loss.
- SAGE uses CLIP for open-vocabulary object detection, allowing natural language commands like 'find a fire extinguisher'
- Achieved 13.7× mean speedup over FTU in object discovery tasks while maintaining 99%+ volumetric coverage
- Deployed on a Modal AI Starling 2 quadrotor with real-time onboard planning and offboard CLIP inference
Why It Matters
Enables drones to find specific objects in unknown buildings using natural language, useful for search-and-rescue and inventory