COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression
New training-free method shrinks Transformer models like GPT and Llama with superior accuracy retention.
Researchers from an international team developed COMPOT, a training-free compression framework for Transformer models like GPT and Llama. It uses a small calibration dataset and orthogonal dictionaries for closed-form updates, eliminating iterative optimization. The method includes dynamic allocation for layer-wise compression. Experiments show COMPOT delivers a superior quality-compression trade-off over low-rank and sparse baselines and remains compatible with post-training quantization for extreme compression.
Why It Matters
Enables more efficient deployment of large language models on edge devices and in cost-sensitive production environments.