Optimal Multi-Debris Mission Planning in LEO: A Deep Reinforcement Learning Approach with Co-Elliptic Transfers and Refueling
A new AI model using Masked PPO can plan missions to remove twice as much space debris as traditional methods.
A new research paper presents a breakthrough in using deep reinforcement learning (RL) to tackle the growing problem of space debris. Researchers Agni Bandyopadhyay and Gunther Waxenegger-Wilfing developed a unified planning framework for multi-target active debris removal (ADR) missions in Low Earth Orbit (LEO). Their system combines co-elliptic maneuver planning—integrating Hohmann transfers and safety ellipse proximity operations—with explicit refueling logic within a realistic orbital simulator featuring randomized debris fields and keep-out zones.
The team benchmarked three distinct planning algorithms: a Greedy heuristic, Monte Carlo Tree Search (MCTS), and their deep RL approach using Masked Proximal Policy Optimization (PPO). Across 100 test scenarios, the Masked PPO agent demonstrated superior mission efficiency and computational performance. Critically, it achieved the removal of up to twice as many debris objects as the Greedy baseline while significantly outperforming MCTS in terms of runtime, making it a practical solution for complex, real-time orbital planning.
This work, presented at the IFAC Workshop on Control Aspects of Multi-Satellite Systems (CAMSAT) 2025, addresses a critical bottleneck in space sustainability. The 'space junk' problem in LEO threatens operational satellites and future missions. Traditional mission planning for debris removal is computationally intensive and often suboptimal. This RL-based approach provides a path toward scalable, safe, and resource-efficient autonomous mission planning, potentially enabling future ADR spacecraft to clear orbital highways with unprecedented efficiency.
- Masked PPO RL agent visited up to 2x more debris than greedy heuristic baselines in simulations.
- The system outperformed Monte Carlo Tree Search (MCTS) in runtime, enabling faster mission planning.
- Tested in 100 realistic LEO scenarios with randomized debris, keep-out zones, and delta-V constraints.
Why It Matters
Enables autonomous, efficient cleanup of dangerous space debris, protecting satellites and future space missions.