Research & Papers

[P] PCA before truncation makes non-Matryoshka embeddings compressible: results on BGE-M3 [P]

r/MachineLearning April 09, 2026

⚡A simple PCA rotation before truncation preserves 99.6% cosine similarity in compressed embeddings.

Deep Dive

A viral research post reveals a straightforward yet powerful technique for compressing dense embedding models that weren't specifically trained for dimensionality reduction, such as the popular BGE-M3. The core problem is that naively truncating the final dimensions of a standard embedding destroys its information-retrieval capabilities. The proposed solution is to fit a Principal Component Analysis (PCA) model once on a sample of embeddings, rotate all vectors into this new basis where variance is ordered, and then truncate. This concentrates the meaningful 'signal' into the leading components, making truncation non-arbitrary.

Results on a 1024-dimensional BGE-M3 sample are striking: compressing to 512 dimensions with PCA-first achieved a 0.996 cosine similarity versus 0.707 for naive truncation. At 256 dimensions, the gap widened to 0.974 vs. 0.467. The method also combines effectively with quantization. Applying 3-bit quantization after PCA compression to 384 dimensions achieved a 27.7x compression ratio with a 0.979 cosine similarity, creating a practical middle ground between high-quality scalar quantization and aggressive binary methods.

The analysis highlights a crucial caveat for real-world use: while cosine similarity metrics remain high even under aggressive compression, task-specific metrics like Recall@10 degrade more quickly. For the 27.7x compression setup, Recall@10 dropped to 76.4%, indicating the need to tune compression based on the end application's priority—perfect reconstruction versus top-tier retrieval accuracy. This work provides a immediately applicable, low-overhead method for developers needing to deploy large embedding models in resource-constrained environments.

Key Points

PCA rotation before truncation preserved 0.990 cosine similarity at 384 dimensions vs. 0.609 for naive truncation on BGE-M3.
Combining PCA with 3-bit quantization achieved 27.7x compression with a 0.979 cosine similarity, though Recall@10 dropped to 76.4%.
The method is a one-time preprocessing step, making it viable for compressing existing, non-Matryoshka-trained embedding models in production.

Why It Matters

Enables massive storage and cost savings for AI applications using embeddings, making advanced retrieval viable on smaller devices and budgets.

Read Original Article

[P] PCA before truncation makes non-Matryoshka embeddings compressible: results on BGE-M3 [P]

Why It Matters

Stay Ahead in AI