Research & Papers

Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective

Researchers derive a closed-form mutual information objective to optimize memory in LLMs.

Deep Dive

A team of researchers from Sichuan University has introduced CapKV, a novel method for key-value (KV) cache eviction in large language models (LLMs) that is grounded in information theory rather than empirical heuristics. The work, submitted to arXiv, addresses the critical memory bottleneck in long-context generation by rethinking cache eviction through the lens of the Information Bottleneck principle. Under a linear-Gaussian surrogate of attention, the authors derive a closed-form mutual information objective that quantifies the effective information capacity of a retained KV cache subset. This formulation reveals that many existing eviction strategies are essentially different approximations of the same capacity-maximization principle.

CapKV directly targets information preservation using a log-determinant approximation computed via statistical leverage scores, replacing heuristic selection with a theoretically rigorous mechanism. Extensive experiments across multiple models and long-context benchmarks demonstrate that CapKV consistently outperforms prior methods, achieving a better trade-off between memory efficiency and generation fidelity. The paper includes 19 pages and 6 figures, with code expected to be released. This work provides a unified theoretical foundation for KV cache eviction, potentially enabling more efficient deployment of LLMs for long-context applications like document analysis, code generation, and conversational AI.

Key Points
  • CapKV uses the Information Bottleneck principle to derive a closed-form mutual information objective for KV cache eviction.
  • It replaces heuristic selection with a log-determinant approximation using statistical leverage scores.
  • Experiments show CapKV achieves a better memory-efficiency vs. generation fidelity trade-off than prior methods.

Why It Matters

A principled approach to KV cache eviction could reduce memory costs and enable longer context windows in production LLMs.