MARS-DA: A Hierarchical Reinforcement Learning Framework for Risk-Aware Multi-Agent Bidding in Power Grids
Hierarchical RL balances profit and risk in volatile electricity markets using PJM data.
In a new arXiv preprint, Jiayi Chen, Xuan Zhang, and Guiling Wang introduce MARS-DA, a hierarchical reinforcement learning framework designed to tackle the complex challenge of risk-aware multi-agent bidding in wholesale electricity markets. As renewable energy penetration increases, price volatility between Day-Ahead (DA) and Real-Time (RT) settlements has made traditional RL approaches brittle—often overfitting to specific conditions or ignoring stochastic spreads. MARS-DA directly addresses this with a two-tier structure: a top-level Meta-Controller dynamically decides which specialized base agent to follow—the 'Safe Agent' focuses on reliable DA commitments, while the 'Speculator Agent' targets volatile RT arbitrage opportunities. The authors also open-source a high-fidelity gymnasium environment grounded in extensive empirical data from the PJM Interconnection, providing a standardized testbed for risk-sensitive agents.
Extensive experiments show MARS-DA achieves superior risk-adjusted returns compared to state-of-the-art baselines, maintaining robust regime alignment even during periods of extreme market volatility. By explicitly modeling the interplay between DA commitments and RT deviations, the framework balances profit maximization with risk management. This work has significant implications for power producers and grid operators seeking to optimize bidding strategies under uncertainty. The open-source environment also lowers the barrier for further research on multi-agent systems and reinforcement learning in energy markets.
- MARS-DA uses a Meta-Controller to orchestrate a Safe Agent (for DA allocation) and a Speculator Agent (for RT arbitrage) within a hierarchical RL framework.
- The framework is tested on a new open-source gymnasium environment built from PJM Interconnection data, modeling two-settlement market dynamics.
- Achieves superior risk-adjusted returns and robust regime alignment compared to baselines, especially under extreme market volatility.
Why It Matters
Enables power producers to optimize bidding under renewable volatility, balancing profit with risk in real-world energy markets.