They solved AI’s memory problem!
New architecture prevents information loss in deep networks, boosting reasoning scores while cutting power.
The Kimi Team has unveiled a breakthrough AI architecture called Attention Residuals that fundamentally addresses the persistent problem of 'AI amnesia' in deep neural networks. Traditional large language models stack hundreds of layers, forcing data through a rigid pipeline where original context gets buried and lost as information is compressed into a single state. This new architecture introduces a dynamic retrieval system where each individual layer can actively look back and selectively pull relevant information from any preceding layer, preventing the degradation of key details during multi-step reasoning tasks.
To overcome the immense computational overhead of allowing every layer to query every past layer—which would overwhelm GPU memory and data center bandwidth—the researchers developed 'Block Attention Residuals.' This technique groups layers into distinct blocks, keeping intensive data retrieval local to hardware while passing only condensed summaries between servers. This maintains both the model's logical depth and hardware efficiency. The result is a system that demonstrates unprecedented performance gains, with models scoring significantly higher on rigorous benchmarks like GPQA-Diamond and MMLU while using 1.25 times less computing power for training.
This architectural shift represents a major leap toward AI 'neuroplasticity,' where networks can autonomously rewire their internal pathways based on context, much like the human brain. By solving the fundamental information flow problem that has limited deep learning models, Attention Residuals opens new possibilities for tackling highly complex, multi-step problems that require maintaining a clear train of thought across thousands of processing steps.
- Dynamically allows each layer to retrieve specific info from any previous layer, preventing context loss
- Uses 'Block Attention Residuals' to group layers, reducing GPU memory/bandwidth overload between servers
- Boosts scores on graduate-level benchmarks (GPQA-Diamond, MMLU) while using 1.25x less compute power
Why It Matters
Enables AI to handle vastly more complex, multi-step reasoning tasks efficiently, accelerating progress toward human-like problem-solving.