Research & Papers

Density-aware Soft Context Compression with Semi-Dynamic Compression Ratio

arXiv cs.CL March 30, 2026

⚡New framework uses a Discrete Ratio Selector to compress long documents 2x more efficiently than static methods.

Deep Dive

A research team including Yijiong Yu and Shuai Yuan has published a new paper, "Density-aware Soft Context Compression with Semi-Dynamic Compression Ratio," tackling a major bottleneck in large language models (LLMs). Processing long documents is computationally expensive, and while soft context compression (encoding long text into fewer latent tokens) helps, existing methods use a uniform compression ratio. This fails because information density in language varies wildly; a legal contract is dense, while a novel is sparse. The intuitive fix—a fully dynamic, input-dependent ratio—proved problematic, as models struggle with continuous structural hyperparameters.

To solve this, the team developed the Semi-Dynamic Context Compression framework. Its core is a Discrete Ratio Selector, a component trained to predict the intrinsic information density of an input and then quantize that prediction to a predefined set of discrete compression ratios. This selector is jointly trained with the compressor on synthetic data, using summary lengths as a proxy to create labels for the ratio prediction. Extensive evaluations show this density-aware framework, even with a simple mean pooling backbone, consistently outperforms static compression baselines. The work establishes a new Pareto frontier for the trade-off between compression and performance, offering a more efficient path for LLMs to handle long contexts like books, lengthy reports, or multi-document analysis. The team has made their code, data, and model weights publicly available.

Key Points

Introduces a Discrete Ratio Selector that predicts and quantizes compression targets based on text information density.
Solves the model instability of fully dynamic compression by using a predefined set of discrete ratios.
Establishes a superior Pareto frontier for performance vs. compression, outperforming static baselines using mean pooling.

Why It Matters

Enables faster, cheaper processing of books and long documents by LLMs, moving beyond one-size-fits-all compression.

Read Original Article

Density-aware Soft Context Compression with Semi-Dynamic Compression Ratio

Why It Matters

Stay Ahead in AI