Density-aware Soft Context Compression with Semi-Dynamic Compression Ratio
New framework uses a Discrete Ratio Selector to compress long documents 2x more efficiently than static methods.
A research team including Yijiong Yu and Shuai Yuan has published a new paper, "Density-aware Soft Context Compression with Semi-Dynamic Compression Ratio," tackling a major bottleneck in large language models (LLMs). Processing long documents is computationally expensive, and while soft context compression (encoding long text into fewer latent tokens) helps, existing methods use a uniform compression ratio. This fails because information density in language varies wildly; a legal contract is dense, while a novel is sparse. The intuitive fix—a fully dynamic, input-dependent ratio—proved problematic, as models struggle with continuous structural hyperparameters.
To solve this, the team developed the Semi-Dynamic Context Compression framework. Its core is a Discrete Ratio Selector, a component trained to predict the intrinsic information density of an input and then quantize that prediction to a predefined set of discrete compression ratios. This selector is jointly trained with the compressor on synthetic data, using summary lengths as a proxy to create labels for the ratio prediction. Extensive evaluations show this density-aware framework, even with a simple mean pooling backbone, consistently outperforms static compression baselines. The work establishes a new Pareto frontier for the trade-off between compression and performance, offering a more efficient path for LLMs to handle long contexts like books, lengthy reports, or multi-document analysis. The team has made their code, data, and model weights publicly available.
- Introduces a Discrete Ratio Selector that predicts and quantizes compression targets based on text information density.
- Solves the model instability of fully dynamic compression by using a predefined set of discrete ratios.
- Establishes a superior Pareto frontier for performance vs. compression, outperforming static baselines using mean pooling.
Why It Matters
Enables faster, cheaper processing of books and long documents by LLMs, moving beyond one-size-fits-all compression.