Robotics

Reinforcement learning and coalition game reduce drone logistics costs 39.76%

Transformer-based RL enables 32 drones to form overlapping coalitions, cutting logistics costs by nearly 40%.

Deep Dive

In dynamic urban logistics, the stochastic emergence of time-sensitive tasks poses a significant optimality challenge for heterogeneous autonomous aerial vehicles (AAVs). A team of researchers from multiple Chinese institutions (including Yuze Zhou, Jingliang Sun, and others) presents a novel solution: a reinforcement learning enhanced overlapping coalition formation game (RL-OCF) approach. Their method first establishes a dynamic task allocation model where global optimality is mathematically quantified by a generalized logistics cost that couples service quality with resource consumption. To handle time-varying task sets from stochastic order arrivals, they design a transformer-based soft actor-critic network. This network uses multi-head self-attention to encode variable-length logistics states and capture task-wise spatiotemporal dependencies, adaptively guiding coalition updates instead of relying on heuristic rules. The coalition formation process is proven to be an exact potential game, guaranteeing convergence to a Nash-stable equilibrium in a finite number of iterations.

Numerical simulations demonstrate the algorithm's effectiveness under the generalized logistics cost criterion. In a scenario with 32 AAVs and 80 tasks, the proposed RL-OCF achieves a 39.76% cost reduction compared to a heuristic overlapping coalition formation baseline. Indoor flight experiments further validate its practicality, confirming that heterogeneous AAVs can form more efficient overlapping coalitions for dynamic logistics tasks. The work represents a significant step toward real-time, optimized task allocation for drone fleets in urban environments, especially where tasks appear unpredictably and require collaborative execution. By replacing manual heuristics with learned policies, the method offers a scalable approach to managing complex logistics systems with multiple autonomous agents.

Key Points
  • 39.76% cost reduction in simulation with 32 AAVs and 80 tasks compared to heuristic OCF baseline.
  • Uses transformer-based soft actor-critic network with multi-head self-attention to encode variable-length logistics states.
  • Coalition formation process is proven to be an exact potential game, guaranteeing convergence to Nash-stable equilibrium in finite iterations.

Why It Matters

Enables autonomous drone fleets to dynamically optimize task allocation in real-time urban logistics, reducing operational costs significantly.