Image & Video

A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning

260,927 heterogeneous 3D brain scans from 910 public sources, ready for self-supervised learning — no preprocessing required.

Deep Dive

Researchers led by Stefano Cerri (Technical University of Denmark) have released FOMO260K — a massive, heterogeneous dataset of 260,927 3D brain magnetic resonance imaging (MRI) scans. The dataset aggregates scans from 77,589 sessions across 55,378 subjects, drawn from 910 publicly available sources. It combines both clinical- and research-grade images, multiple MRI sequences (e.g., T1, T2, FLAIR), and a wide range of anatomical and pathological variability — including scans with large brain anomalies. This diversity makes it a uniquely challenging and representative benchmark for training robust models.

To reduce barriers for new users, the team applied minimal preprocessing, preserving original image characteristics such as resolution and intensity ranges. Alongside the dataset, they release companion code for self-supervised pretraining (e.g., contrastive learning, masked image modeling) and finetuning pipelines, plus a set of pretrained models. FOMO260K is intended to support the development and fair benchmarking of self-supervised learning methods in medical imaging at an unprecedented scale, potentially accelerating research in automated diagnosis, brain lesion detection, and segmentation tasks.

Key Points
  • Dataset size: 260,927 scans from 55,378 subjects across 910 sources — largest public brain MRI collection for self-supervised learning.
  • Includes both clinical and research-grade images with multiple MRI sequences and high pathological variability, including major brain anomalies.
  • Comes with companion code for self-supervised pretraining and finetuning, plus pretrained models, lowering the entry barrier for researchers.

Why It Matters

FOMO260K gives the medical AI community a massive, standardized benchmark to train and compare self-supervised models for brain imaging tasks.