Research & Papers

Bank of Values (BoV) boosts LLM efficiency with context-free value vectors

New attention method cuts compute and memory while matching top benchmark scores.

Deep Dive

A new paper from researchers Muyu He, Yuchen Liu, Qingya Huang, and Li Zhang introduces Bank of Values (BoV), a novel approach to computing value vectors in transformer attention layers. The key insight: in deeper layers, context-dependent value vectors derived from the residual stream provide little benefit compared to simple context-free token-specific vectors. BoV learns a lookup table of such vectors for the last third of layers, storing them as sparse parameters that don't need recomputation or caching.

Across 135M and 780M parameter models, BoV consistently improves validation loss over standard attention. At 780M scale, it matches the performance of the prior best method that adds token information to value vectors, but with lower compute and memory requirements. The method was validated across 21 benchmarks. This suggests that deeper layers may primarily need token identity preservation rather than contextual mixing, opening the door to more efficient LLM architectures.

Key Points
  • BoV replaces context-dependent value vectors in the last third of layers with context-free token-specific vectors stored as sparse parameters
  • At 780M parameters, BoV improves validation loss and matches the previous best method on 21 benchmarks using less compute and memory
  • Design eliminates the need to recompute or persistently cache value vectors, reducing operational overhead

Why It Matters

BoV could make large language models more efficient to deploy by trimming attention computation in deep layers without sacrificing accuracy.