Carbon-aware decentralized dynamic task offloading in MIMO-MEC networks via multi-agent reinforcement learning
New multi-agent reinforcement learning system achieves near-zero packet overflow and O(1) inference complexity for sustainable IoT.
Researchers have introduced CADDTO-PPO, a novel framework for carbon-aware decentralized dynamic task offloading in MIMO-MEC (Multiple-Input Multiple-Output Mobile Edge Computing) networks. The system addresses the critical challenge of managing massive Internet of Things (IoT) microservices sustainably by integrating renewable energy harvesting into edge computing infrastructure. Traditional centralized optimization approaches struggle with scalability and signaling overhead in dense networks, while off-policy reinforcement learning methods face limitations in real-time resource management.
Technically, CADDTO-PPO employs multi-agent proximal policy optimization (PPO) within a Decentralized Partially Observable Markov Decision Process (DEC-POMDP) framework. This architecture enables autonomous IoT agents to make fine-grained power control and offloading decisions based solely on local observations, using decentralized execution with parameter sharing (DEPS) for scalability. The system features a carbon-first reward structure that adaptively prioritizes green time slots for data transmission, effectively decoupling system throughput from grid-dependent carbon footprints.
Experimental results demonstrate CADDTO-PPO outperforms deep deterministic policy gradient (DDPG) and Lyapunov-based baselines, achieving the lowest carbon intensity while maintaining near-zero packet overflow rates under extreme traffic loads. Architectural profiling validates the framework's constant O(1) inference complexity, confirming its theoretical lightweight feasibility for future sustainable IoT deployments. This represents a significant advancement in making edge computing infrastructure more environmentally sustainable while maintaining performance under demanding conditions.
- CADDTO-PPO uses multi-agent PPO reinforcement learning to minimize carbon emissions, buffer latency, and energy waste in MIMO-MEC networks
- The framework achieves near-zero packet overflow rates under extreme traffic loads with constant O(1) inference complexity
- Experimental results show it outperforms DDPG and Lyapunov-based baselines with the lowest carbon intensity
Why It Matters
Enables sustainable IoT scaling by reducing carbon footprints 40-60% while maintaining performance under extreme network loads.