CosFly's drone simulation pipeline generates 100K tracking images for aerial AI research
A new box-structured planning system turns 3D worlds into multi-modal training data for UAV tracking.
CosFly, developed by Hanxuan Chen and ten co-authors, is a box-structured planning and multimodal simulation pipeline purpose-built for aerial target tracking. Running on the CARLA simulator, the pipeline follows a modular 7-step construction workflow: starting from 3D map export, it simplifies grids, plans pedestrian and drone trajectories, renders multi-modal sensor data (RGB, high-precision depth maps, semantic segmentation masks) with complete 6-DOF drone pose annotations (x, y, z, yaw, pitch, roll), performs quality inspection, and finally generates teacher-student captions paired with natural language navigation instructions. A key configurable feature is the ability to set fixed-FOV zoom levels per trajectory, allowing simulation of different focal lengths through camera-intrinsic adjustments.
The paper analyzes two trajectory-planning paradigms for aerial tracking: a conventional two-stage pipeline that first generates candidate trajectories then refines them, and a direct gradient-based formulation that optimizes multiple tracking constraints in a single objective. Accompanying the pipeline is the publicly released CosFly-Track dataset, containing 250 validated trajectories and approximately 100,000 rendered images with full pose metadata, sourced from diverse environments including urban centers, highways, rural landscapes, forests, and coastal towns. Together, CosFly establishes a scalable foundation for aerial-ground collaborative research, enabling realistic training and evaluation of UAV navigation, multi-modal perception, and dynamic target tracking systems.
- CosFly provides a 7-step pipeline from 3D map export through grid simplification, trajectory planning, multi-modal rendering with 6-DOF annotations, and caption generation.
- CosFly-Track dataset includes 250 validated trajectories and ~100,000 images across six diverse environments, with configurable fixed-FOV zoom levels.
- Two planning paradigms compared: conventional two-stage candidate generation + refinement vs. direct gradient-based optimization for aerial tracking constraints.
Why It Matters
Standardized simulation and high-fidelity dataset accelerate UAV tracking research, enabling safer autonomous drone operations in complex real-world environments.