Research & Papers

New IBP compression algorithm accelerates ML training by up to 180%

Lossless compression eliminates PCIe bottlenecks without any accuracy trade-offs.

Deep Dive

Invariant Bit Packing (IBP) is a new lossless compression algorithm that reduces GPU memory transfer times for ML workloads. IBP identifies invariant bits across groups of tensors and uses GPU-optimized decompression with warp parallelism. It achieves, on average, 74% faster GNN training, 180% faster DLRM embedding lookup, and 24% faster LLM inference—without any accuracy loss.

Key Points
  • IBP identifies and eliminates invariant bits across groups of tensors for lossless compression
  • Achieves 74% faster GNN training, 180% faster DLRM embedding lookup, and 24% faster LLM inference
  • Provides easy-to-use APIs with support for GNN training, DLRM, and LLM inference frameworks

Why It Matters

Lossless compression removes accuracy concerns, making it practical for production ML deployments to drastically reduce GPU memory transfer bottlenecks.