AutoCompress: Critical Layer Isolation for Efficient Transformer Compression
A 60x importance gap means Layer 0 gets full protection, others compress.
A new paper from researcher Archit Thorat introduces AutoCompress, a transformer compression technique that exploits a surprising empirical finding: in small transformer models, Layer 0 is disproportionately critical. Using Neural Tangent Kernel (NTK) analysis, Thorat found that Layer 0 has an importance score of 3.6, while all other layers max out at 0.054—a gap of over 60x. This suggests that standard uniform compression approaches waste capacity by treating all layers equally.
AutoCompress implements Critical Layer Isolation (CLI), an architecture that keeps Layer 0 at full dimensionality, applies a learned bottleneck to intermediate layers, and restores full dimension at the final output layer. When applied to GPT-2 Medium (354.8M parameters), CLI-GPT2 achieves 204.5 perplexity on WikiText-103 with only 143.8M parameters—a 2.47x compression ratio and 59.5% parameter reduction. Crucially, a uniform bottleneck baseline of comparable size achieved only 571.8 perplexity, proving that protecting Layer 0—not just shrinking the model—is the key driver of performance. Code and checkpoints are publicly available.
- Layer 0 has a 60x higher NTK importance score (3.6) than other layers (max 0.054)
- CLI-GPT2 achieves 204.5 perplexity with 143.8M parameters vs GPT-2 Medium's 354.8M
- Uniform bottleneck baseline of same size scores only 571.8 perplexity—proving Layer 0 protection is critical
Why It Matters
Enables efficient deployment of compact transformers by preserving critical early layers, reducing compute cost by 59.5%.