Research & Papers

Learning Representations from Incomplete EHR Data with Dual-Masked Autoencoding

arXiv cs.LG February 18, 2026

⚡New model processes incomplete EHR data without imputation, beating XGBoost and DuETT on clinical tasks.

Deep Dive

Researchers from MIT and Harvard developed AID-MAE, a dual-masked autoencoder that learns directly from incomplete Electronic Health Records. It uses two masks—one for natural missing values, one for hiding observed values—and processes only unmasked tokens. The model outperformed XGBoost and DuETT baselines across multiple clinical prediction tasks on two datasets, creating embeddings that naturally stratify patient cohorts without requiring data imputation first.

Why It Matters

Enables more accurate clinical predictions from real-world, messy patient data where measurements are irregular and often missing.

Read Original Article

Learning Representations from Incomplete EHR Data with Dual-Masked Autoencoding

Why It Matters

Stay Ahead in AI