MEDAL framework gives t-SNE and UMAP out-of-sample mapping for rigorous validation
New autoencoder distillation method lets you validate manifold embeddings like supervised models
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Low-dimensional embeddings from methods like t-SNE and UMAP are widely used for visualizing high-dimensional data, but they lack formal validation because they do not provide a way to map new samples into the embedding space (out-of-sample) or invert the embedding back to the original features. A team led by Irene Chang at Rice University introduces MEDAL (Manifold Embedding Distillation via Autoencoder Learning) to solve this. The approach trains a constrained autoencoder where the bottleneck layer is forced to exactly reproduce a fixed teacher embedding. The decoder then reconstructs the original input from the bottleneck, yielding both an explicit mapping for new data and an approximate inverse. The reconstruction error serves as a pointwise distortion measure, making the embedding amenable to held-out validation.
In practice, MEDAL turns static embeddings into testable models. Users can split data into training and test sets, fit t-SNE or UMAP on the training set, then distill that embedding with MEDAL. The resulting encoder-decoder can be evaluated on held-out data, enabling quantitative comparisons between different methods and hyperparameter choices. The authors demonstrate MEDAL across multiple benchmarks and scientific case studies. Notably, it reveals biologically coherent regions that are difficult to preserve in 2D, and it detects distribution shift when new samples are mapped into a fixed reference manifold. MEDAL acts as a general validation wrapper for any existing dimension reduction technique, promising to improve reproducibility and rigor in exploratory data analysis.
- MEDAL provides out-of-sample mapping and approximate inversion for t-SNE, UMAP, and other manifold embeddings
- Enables held-out validation, allowing practitioners to quantitatively compare methods and tune hyperparameters
- Detects distribution shift in reference manifolds and reveals biologically coherent structures not visible in 2D
Why It Matters
Makes non-linear dimension reduction scientifically rigorous with reusable, testable models for downstream validation