1,020 hours of synchronized EEG, EMG, and audio from three speakers?

1,020 hours of synchronized EEG, EMG, and audio from three speakers

Recorded using three EEG systems (ultra-high-density and cap-type, 62–128 channels) across months?

Recorded using three EEG systems (ultra-high-density and cap-type, 62–128 channels) across months

Released under CC0 on OpenNeuro in BIDS format for speech decoding and multimodal research?

Released under CC0 on OpenNeuro in BIDS format for speech decoding and multimodal research

Research & Papers

Japanese researchers release 1,000-hour EEG-EMG-audio speech dataset

arXiv q-bio.NC June 02, 2026

⚡1020 hours of synchronized brain signals, facial muscle data, and audio from three speakers.

Deep Dive

A team of Japanese researchers led by Motoshige Sato has published the largest open dataset of its kind: 1,020 hours of synchronized electroencephalography (EEG), facial electromyography (EMG), and speech audio from three healthy native Japanese speakers. Recordings were captured using three different EEG systems — an ultra-high-density system and two cap-type systems (including eegosports) — spanning 62 to 128 channels across many sessions over several months. Each session provides time-aligned signals, speech-event annotations, and full transcriptions. Technical validation confirmed expected spectral profiles (1/f noise), task-related alpha attenuation, and time-locked evoked responses, ensuring data quality for downstream research.

This dataset is released in the Brain Imaging Data Structure (BIDS) format on OpenNeuro under a CC0 waiver, making it freely available for any use. While primarily motivated by speech decoding applications, the resource also enables work on multimodal signal processing, artifact modeling, cross-device adaptation, and EEG representation learning. For AI researchers, the combination of neural and muscular signals alongside audio offers a rare opportunity to train models that map brain activity directly to speech — a critical step for non-invasive brain-computer interfaces. The open-access nature lowers barriers for reproducibility and accelerates progress in neural speech prosthetics.

Key Points

1,020 hours of synchronized EEG, EMG, and audio from three speakers
Recorded using three EEG systems (ultra-high-density and cap-type, 62–128 channels) across months
Released under CC0 on OpenNeuro in BIDS format for speech decoding and multimodal research

Why It Matters

Enables robust speech decoding and multimodal AI training with open-access brain and speech data.

Read Original Article

Japanese researchers release 1,000-hour EEG-EMG-audio speech dataset

Why It Matters

Related Articles

🚀 Stay Ahead in AI