Research & Papers

AoI-MDP optimizes underwater robot decisions with information freshness

New RL framework outperforms standard MDP by modeling observation delay as signal delay

Deep Dive

Ocean exploration places high demands on autonomous underwater vehicles (AUVs), especially when observation delays degrade real-time decision-making. Researchers introduce AoI-MDP (Age of Information optimized Markov Decision Process), a reinforcement learning framework that explicitly models these delays as signal delay within the state space. It also introduces wait time as an action and integrates AoI directly into the reward function, prioritizing information freshness during policy optimization. This approach enables AUVs to balance the trade-off between acting immediately and waiting for more current data.

Simulations demonstrate that AoI-MDP consistently outperforms standard MDP baselines across various underwater tasks, showing superior feasibility and generalization. The authors have released the code as open-source to accelerate related research. This work has practical implications for subsea inspection, environmental monitoring, and autonomous navigation where stale sensor readings can lead to mission failure or safety risks.

Key Points
  • Models observation delay as signal delay directly in the state space of a Markov decision process
  • Introduces wait time in the action space and integrates Age of Information (AoI) with reward functions
  • Outperforms standard MDP in simulations; open-source code available on arXiv

Why It Matters

Minimizing stale sensor data improves autonomous underwater vehicle reliability in critical missions like inspection and monitoring