Pack only the essentials: Adaptive dictionary learning for kernel ridge regression
New method packs kernel matrices into 10x less space without losing accuracy...
Kernel ridge regression (KRR) is a powerful machine learning method, but its Achilles' heel has always been memory: storing the kernel matrix K_n for n samples requires O(n²) space, making it impractical for large datasets. Traditional Nyström approximations reduce this to O(nm) by sampling m columns, but uniform sampling often needs O(n) columns for high-coherence data. Ridge leverage score (RLS) sampling improves this by using m proportional to the effective dimension, but computing exact RLS itself requires O(n²) space—a catch-22.
Enter SQUEAK, a new algorithm from Daniele Calandriello, Alessandro Lazaric, and Michal Valko. SQUEAK builds on the INK-Estimate framework, which processes datasets incrementally and updates RLS on the fly. The key innovation is using unnormalized RLS, which simplifies the algorithm significantly—no need to estimate the effective dimension for normalization. This reduces space complexity to just a constant factor worse than exact RLS sampling, while maintaining accuracy. The paper, presented at NeurIPS 2016 Workshop on Adaptive and Scalable Nonparametric Methods, demonstrates that SQUEAK can handle large-scale KRR tasks that were previously infeasible, making it a practical tool for high-dimensional data analysis.
- SQUEAK reduces KRR memory from O(n²) to O(nm) using adaptive dictionary learning
- Uses unnormalized ridge leverage scores, eliminating need for effective dimension estimation
- Achieves space complexity only a constant factor worse than exact RLS sampling
Why It Matters
Enables kernel methods on massive datasets, unlocking scalable ML for high-coherence data without memory blowup.