Beyond Memorization: Do Larger Models Know More, or Just Better?
Two papers reveal factual knowledge may still depend on parameters, not just architecture.
A pair of new research papers is stirring debate about what makes large language models truly effective. The first, the Densing Law of LLMs, observes a striking trend: roughly every three months, a new model achieves similar performance to its predecessor using half the parameters. This suggests rapid efficiency gains in parameter usage over time, independent of scaling raw size.
The second paper, Incompressible Knowledge Probes, offers a more sobering counterpoint. It argues that improvements in architecture and training methods—like better attention mechanisms or instruction tuning—primarily enhance reasoning ability and instruction following, but not the factual knowledge stored in the model. That factual knowledge, they claim, remains strictly dependent on the number of parameters. In other words, you can't compress a model's factual knowledge without losing it. The papers together imply that future LLMs may converge on a hybrid design: leaner models optimized for reasoning, paired with external retrieval systems (RAG) to supply factual data on demand. This could dramatically reduce the compute needed for deployment, but also raises questions about whether ever-larger models are worth the cost for fact-heavy tasks.
- Densing Law: Model performance equivalent with half the parameters every 3 months.
- Incompressible Knowledge Probes: Factual knowledge scales with parameter count, not architecture.
- Implication: Future LLMs may offload facts to external retrieval, focusing internal capacity on reasoning.
Why It Matters
Could shift AI development toward smaller, reasoning-focused models paired with external knowledge retrieval.