High-Fidelity Digital Twin Dataset Generation for Inverter-Based Microgrids Under Multi-Scenario Disturbances
New dataset captures 500,001 samples per scenario at 2μs intervals to train AI for grid stability.
A team of researchers from the University of Tulsa and the University of Texas at Arlington has published a groundbreaking dataset designed to accelerate AI development for modern power grids. The work, led by Osasumwen Ogiesoba-Eguakun, addresses a critical gap: public power-system datasets often lack the detailed electromagnetic transient (EMT) waveforms and inverter control dynamics needed to train accurate AI models. Their solution is a high-fidelity digital twin generated from a detailed MATLAB/Simulink model of a low-voltage AC microgrid with ten inverter-based distributed generators.
The dataset is exceptionally detailed, recording 38 synchronized channels of data—including three-phase voltages, currents, and per-generator power metrics—sampled every 2 microseconds over 1-second scenarios. This results in 500,001 data points per run. It covers 11 critical operating and disturbance scenarios, from normal operation and load steps to three-phase faults, generator trips, and even cyber-physical challenges like communication delays and measurement noise. Each scenario is rigorously validated using system-level metrics to ensure physical observability and correct timing.
This resource is specifically engineered for AI and machine learning applications. It provides a consistent, labeled benchmark for tasks like training surrogate models (AI that can mimic complex grid physics), developing algorithms for automatic disturbance classification, and testing the robustness of grid control systems under noise and delay. By releasing both the dataset and processing scripts, the team aims to standardize research and development for inverter-dominated microgrids, which are increasingly central to integrating renewable energy. The work represents a significant step toward using AI to ensure the stability and resilience of tomorrow's decentralized, clean-energy grids.
- Dataset includes 38 synchronized channels sampled at 2μs intervals, producing 500,001 samples per 1-second scenario.
- Covers 11 critical grid scenarios including faults, generator trips, load changes, and cyber-physical disturbances like noise and delay.
- Provides a standardized benchmark for training AI surrogate models and testing resilience in inverter-based microgrids.
Why It Matters
Enables AI development for stabilizing decentralized renewable energy grids, a cornerstone of the clean energy transition.