AI Expert Argues 'Next Token Prediction' Is a Misleading Term for LLMs
Calling LLMs 'glorified autocomplete' ignores how they learn deep structure through training.
Deep Dive
The author argues that describing LLMs as 'next token predictors' is misleading and inaccurate. While pre-training uses next token prediction, during inference the model outputs a probability distribution and a token is randomly picked from it. This process forces the model to learn language, grammar, and content, such as math and narrative understanding, challenging the common dismissal of LLM cognition.
Key Points
- Pre‑training uses next‑token prediction on trillions of token pairs, but inference involves sampling from probability distributions, not deterministic guessing.
- The training regime forces models to learn grammar, facts, and narrative logic—e.g., predicting 'eighteen' from a math textbook or a murderer's name from a mystery.
- Calling LLMs 'next token predictors' fuels philosophical debates about cognition, but the author argues it misrepresents how models actually generate text.
Why It Matters
Reframes the debate on AI cognition, urging professionals to move beyond simplistic 'autocomplete' narratives.