Research & Papers

DiRe-RAPIDS: Topology-faithful dimensionality reduction at scale

New method preserves 3-4x more topological structure than UMAP.

Deep Dive

A new paper from researchers Alexander Kolpakov and Igor Rivin introduces DiRe-RAPIDS, a dimensionality reduction method that prioritizes topology faithfulness over local neighborhood preservation. The authors demonstrate that popular tools like UMAP and t-SNE, which optimize local metrics, can inadvertently memorize sampling noise and distort global topology—creating phantom cycles or disconnected islands not present in the original data. DiRe-RAPIDS is tuned against a novel benchmark based on noisy manifolds with known homology, achieving Pareto-optimal configurations that recover exact first Betti numbers on stress tests.

In practical terms, DiRe-RAPIDS preserves 3-4 times more topological structure than UMAP when applied to 723K arXiv paper embeddings, all at comparable wall-clock time. It also matches or beats GPU-accelerated UMAP on classification tasks. The method is available via GitHub repositories and a HuggingFace dataset, making it accessible for data scientists working with high-dimensional datasets in fields like bioinformatics, NLP, and social network analysis. This advance could significantly improve the reliability of visual data exploration by ensuring that the structures we see in low-dimensional plots reflect real patterns, not algorithmic artifacts.

Key Points
  • DiRe-RAPIDS preserves 3-4x more topological structure than UMAP on 723K arXiv embeddings.
  • Recovers exact first Betti numbers on noisy manifolds, avoiding false cycles or disconnected islands.
  • Matches or beats GPU-accelerated UMAP on classification tasks at comparable speed.

Why It Matters

DiRe-RAPIDS ensures low-dimensional visualizations reflect true data topology, reducing misleading artifacts in high-stakes analysis.