Absorber LLM: Harnessing Causal Synchronization for Test-Time Training
Absorber LLM crams context into model weights, cutting memory use dramatically.
Transformers face a fundamental trade-off: they can either keep full context (expensive memory that grows with sequence length) or compress history into a fixed-size state (losing long-tail dependencies). A new paper from researchers at Peking University, titled "Absorber LLM: Harnessing Causal Synchronization for Test-Time Training," proposes a third path. Instead of storing context in a cache or state, Absorber LLM literally absorbs historical context into the model's own parameters during inference. The key insight is a self-supervised objective called causal synchronization: after updating the model's parameters with context, a contextless version of the model should produce the same outputs as the original model that had full access to that context. This ensures the context is genuinely internalized, not just memorized at the token level.
The method addresses a known failure mode of Test-Time Training (TTT), where models tend to overfit to local token patterns and lose the causal effect of broader context. Absorber LLM's synchronization objective explicitly preserves that causal effect, making the learned context generalize better to future generations. The approach is evaluated on long-context and streaming benchmarks, where it reduces inference memory compared to standard Transformers and improves accuracy over prior parameter-as-memory methods like TTT. For practitioners, this means running large models on long streams—think real-time document processing, live transcription, or continuous code analysis—without the memory bottleneck. The paper is available on arXiv (2604.20915) and could shift how we think about context management in deployed LLMs.
- Absorber LLM uses a self-supervised 'causal synchronization' objective to absorb context into model parameters during inference.
- It outperforms prior test-time training methods by preserving the causal effect of context, avoiding token-level overfitting.
- The method reduces inference memory to constant size, making it suitable for long-context and streaming applications.
Why It Matters
Enables LLMs to handle infinite-length streams without memory growth, unlocking real-time document and video processing.