Accelerating Reinforcement Learning for Wind Farm Control via Expert Demonstrations
Researchers cut years of RL training to 250k steps using wake model demos...
A team led by Marcus Binder Nilsen from DTU Wind and Energy Systems has developed a method to accelerate reinforcement learning (RL) for controlling wind farms, addressing a key barrier to real-world deployment: the painfully slow training convergence that could otherwise cause years of suboptimal power output. Their approach, detailed in a paper submitted to the Journal of Physics: Conference Series (Torque 2026), uses expert demonstrations from a steady-state wake model (PyWake) to pretrain both the actor and critic networks of a Soft Actor-Critic RL agent. By mimicking the decisions of a domain-knowledge-based optimizer, the agent starts with near-optimal performance instead of from scratch.
In experiments on a 2x2 wind farm simulation (WindGym), the pretrained agent eliminated the costly initial learning phase entirely. While an untrained RL agent underperformed the simple greedy zero-yaw baseline by about 12%, the pretrained agent matched baseline performance from the first step. During online fine-tuning, all configurations converged within 250,000 environment steps to similar performance, ultimately exceeding a lookup-table controller by approximately 7% power gain after 500,000 steps. This work suggests that injecting domain knowledge via behavior cloning can make RL practical for wind farm control, potentially saving years of real-world training time and millions in lost energy revenue.
- Pretraining with PyWake optimizer demonstrations eliminates the initial 12% power loss seen in untrained RL agents.
- The Soft Actor-Critic agent converges to optimal performance in just 250,000 environment steps after pretraining.
- Final controller achieves ~7% power gain over a lookup-table baseline after 500,000 steps.
- Method uses behavior cloning to transfer domain knowledge from steady-state wake models to dynamic RL agents.
Why It Matters
This could slash years of RL training time for wind farms, enabling faster deployment and millions in energy savings.