AI Safety

You can’t imitation-learn how to continual-learn

A viral LessWrong post claims LLMs can't build new knowledge, only process longer context windows.

Deep Dive

In a viral post on LessWrong, researcher Steven Byrnes makes a pointed argument about the limitations of large language models (LLMs) like GPT-4 and Claude 3. He contends that the AI community has a dangerously narrow view of 'continual learning,' reducing it to technical fixes like extending context windows or implementing Retrieval-Augmented Generation (RAG). Byrnes asserts these are merely bandaids for processing more information, not mechanisms for building fundamentally new knowledge or conceptual frameworks. The core of his argument is that true continual learning, as seen in systems like DeepMind's AlphaZero or the human brain, involves permanent weight updates that create an 'ever-growing tower' of competence, allowing for the invention of entirely new fields from scratch.

Byrnes uses a powerful thought experiment to illustrate his point: seal a 'country of geniuses' in a datacenter for 100 years with only a virtual environment. Upon unsealing, you'd find new sciences and philosophies. He argues a group of frozen LLMs, even with massive context windows, could never achieve this because they cannot understand, criticize, and build upon wholly novel concepts not in their training data through forward passes alone. This post has sparked intense debate, challenging a central narrative in AI development and suggesting that scaling current architectures may hit a fundamental ceiling before reaching human-like or superintelligent learning capabilities.

Key Points
  • Argues LLMs lack 'real' continual learning, which requires permanent weight updates to build new knowledge, not just longer context windows.
  • Contrasts LLMs with systems like AlphaZero or human brains, which can invent new fields (e.g., science, math) from scratch through experience.
  • Uses a 'sealed geniuses' thought experiment to claim frozen LLMs could never understand or build upon entirely novel concepts without training data.

Why It Matters

Challenges a core assumption in AI scaling, suggesting a fundamental architectural shift may be needed to reach superintelligence.