Research & Papers

CoreFlow: Low-Rank Matrix Generative Models

CoreFlow trains generative models on matrices with 40% missing data and 9% of ambient dimension.

Deep Dive

CoreFlow, introduced by Dongze Wu, Linglingzhi Zhu, and Yao Xie, tackles the challenge of learning matrix-valued distributions from high-dimensional, often incomplete training data. Traditional ambient-space generative models are computationally expensive and statistically fragile when matrix dimensions are large but sample sizes are limited. CoreFlow is a geometry-preserving low-rank flow model that first learns shared row and column subspaces across the matrix distribution, then trains a continuous normalizing flow only on the induced low-dimensional core. This approach separates shared matrix geometry from sample-specific variation, preserving matrix structure and substantially improving training efficiency.

CoreFlow also handles incomplete training matrices through masked Riemannian updates and iterative completion. Across real and synthetic benchmarks, it substantially improves spectral and moment-level generation quality in few-sample regimes while remaining competitive in data-rich settings, even when compressed to 9% of the ambient dimension and with up to 40% missing training entries. This makes CoreFlow particularly valuable for high-dimensional, limited-sample applications like medical imaging, genomics, or any domain where collecting large, complete matrix datasets is infeasible.

Key Points
  • CoreFlow compresses matrix generation to just 9% of ambient dimension, drastically reducing computational cost.
  • Handles up to 40% missing training entries using masked Riemannian updates and iterative completion.
  • Improves spectral and moment-level generation quality in few-sample regimes while staying competitive in data-rich settings.

Why It Matters

CoreFlow enables efficient generative modeling of high-dimensional matrices with limited data, unlocking applications in medical imaging and genomics.