Bridging Perception and Action: A Lightweight Multimodal Meta-Planner Framework for Robust Earth Observation Agents
A lightweight meta-planner improves tool-calling accuracy and task success rates for EO agents.
Autonomous Earth Observation (EO) agents are increasingly tasked with complex, multi-step operations—from interpreting satellite imagery to executing sequences of tool calls for environmental monitoring. However, current single-model architectures that combine planning and execution often struggle with combinatorial complexity and reasoning errors in dynamic scenarios. To address this, researchers from multiple institutions introduce the Lightweight Multimodal Meta-Planner (LMMP) framework. LMMP separates high-level planning from low-level execution via a dual-awareness mechanism that grounds strategic decisions in both multimodal image features and high-level task semantics. Crucially, it includes a Meta Task Library that injects remote sensing expert knowledge directly into the workflow, standardizing domain logic and ensuring physically feasible plans.
LMMP's two-stage training pipeline first initializes the Meta-Planner through expert-distilled Supervised Fine-Tuning (SFT), then refines it via Direct Preference Optimization (DPO) based on execution feedback. Extensive experiments on a dataset derived from EarthBench and ThinkGeo show that LMMP significantly improves tool-calling accuracy and task success rates compared to baseline architectures. The framework also demonstrates strong plug-and-play versatility, consistently enhancing performance across diverse executor backbones and previously unseen EO missions. This work marks a practical step toward reliable, autonomous AI agents for earth observation tasks.
- Dual-awareness mechanism grounds plans in both multimodal image features and high-level task semantics.
- Meta Task Library injects remote sensing expert knowledge to ensure physically feasible plans.
- Two-stage training (expert-distilled SFT + execution-feedback DPO) achieves significant gains in tool-calling accuracy and success rates.
Why It Matters
Reliable, adaptable AI agents for earth observation can automate disaster monitoring, resource management, and environmental analysis at scale.