Research & Papers

Fast Topology-Aware Lossy Data Compression with Full Preservation of Critical Points and Local Order

New algorithm compresses floating-point data while perfectly preserving all critical topological features.

Deep Dive

A research team from the University of Utah and other institutions has published a breakthrough in data compression for scientific computing. Their new algorithm, the Local-Order-Preserving Compressor (LOPC), solves a critical problem: how to aggressively compress the massive floating-point data generated by simulations and instruments without destroying its topological structure. Unlike standard lossy compressors that only provide point-wise error bounds, LOPC is the first to guarantee the preservation of the full local order of data points. This means it maintains all critical points—like peaks, pits, and passes in a terrain—which are essential for accurate scientific analysis.

The technical achievement is multi-faceted. The paper states LOPC runs 'orders of magnitude faster' than previous topology-preserving compressors, potentially offering 100x speedups, making it practical for real-time data streams. It also achieves higher compression ratios than lossless methods, drastically reducing storage needs. A key feature for reproducible research is its deterministic output, producing bit-for-bit identical results whether run on a CPU or a GPU. This addresses a major hurdle in computational science where non-deterministic compression could lead to irreproducible findings.

The impact is significant for fields like climate modeling, astrophysics, and medical imaging, where datasets routinely reach petabytes. Scientists can now store and share compressed data without fear of altering the relational information between data points that defines the underlying phenomena being studied. The algorithm promises to lower storage costs and accelerate data transfer while maintaining the integrity needed for trustworthy discovery.

Key Points
  • Preserves 100% of local order and all critical points, a first for lossy compression.
  • Runs orders of magnitude (e.g., 100x) faster than previous topology-preserving methods.
  • Delivers bit-for-bit identical output on CPUs and GPUs, ensuring perfect reproducibility.

Why It Matters

Enables efficient storage of massive scientific datasets without corrupting the structural features essential for analysis.