Research & Papers

Post Training Quantization for Efficient Dataset Condensation

A novel technique compresses training datasets to 2-bit images while nearly doubling test accuracy in extreme regimes.

Deep Dive

Researchers Linh-Tam Tran and Sung-Ho Bae have introduced a breakthrough method for compressing the synthetic datasets used to train AI models, a process known as Dataset Condensation (DC). While DC already shrinks massive datasets into smaller, representative subsets, the team identified a major oversight: no one had effectively applied quantization—reducing the number of bits used to store each pixel—to these condensed images. Their novel 'patch-based post-training quantization' approach tackles this by processing images in small patches, applying localized quantization to minimize information loss. To manage the overhead, they use quantization-aware clustering to group similar patches, then introduces a refinement module to correct errors by aligning the distributions of original and dequantized images. The result is a plug-and-play framework that can supercharge any existing DC method.

The impact is substantial, especially in extreme low-bit scenarios. The method was tested across standard benchmarks including CIFAR-10/100, Tiny ImageNet, and ImageNet subsets. At a severely constrained 2-bit width, where conventional quantization causes massive quality degradation, their technique maintains high-fidelity representations. For example, when applied to the DM condensation method at just one image per class (IPC=1), it nearly doubled test accuracy, jumping from 26.0% to 54.1%. This means AI models can now be trained effectively on datasets that are orders of magnitude smaller in storage size without sacrificing performance, a critical advance for deploying models on edge devices or in resource-constrained environments. The work was accepted as an Oral presentation at AAAI-2026.

Key Points
  • Proposes a novel patch-based post-training quantization method that compresses condensed AI training datasets to 2-bit images.
  • Nearly doubles test accuracy in extreme compression (e.g., 26.0% to 54.1% for DM at IPC=1) on CIFAR-10 and ImageNet benchmarks.
  • A plug-and-play framework that reduces storage overhead without expensive retraining, using quantization-aware clustering and a refinement module.

Why It Matters

Enables efficient AI training on edge devices by drastically reducing dataset storage needs without compromising model accuracy.