Research & Papers

Large Language Models and Book Summarization: Reading or Remembering, Which Is Better?

Research shows AI models like GPT-4 can produce superior summaries from training data alone, challenging long-context capabilities.

Deep Dive

A new research paper from a team of six computer scientists, including Tairan Fu and Pedro Reviriego, investigates a critical question in AI: Do large language models summarize books better by 'reading' the full text or by 'remembering' information from their training data? With the advent of models boasting context windows of millions of tokens, it's now technically possible to feed an entire book into a single prompt. The researchers conducted a systematic evaluation, pitting summaries generated solely from a model's internal knowledge against those produced after processing the complete book text.

Their findings reveal a significant and counterintuitive result. While providing the full text generally yields more detailed summaries, for a subset of well-known books, the summaries created from the model's memory alone received higher quality scores. This suggests that the knowledge compressed during training can, in some cases, be more useful for summarization than the raw text input, even when the model has direct access to it. The study puts a spotlight on the fundamental capabilities and limitations of current LLMs. It challenges the assumption that simply scaling context length is a panacea for complex document understanding and implies that a model's pre-existing 'world knowledge' can heavily influence—and sometimes improve—its output, regardless of the provided evidence.

Key Points
  • Study compared book summaries from LLM memory vs. full-text reading, finding memory sometimes wins.
  • Research tested state-of-the-art models on well-known books, challenging the value of massive context windows.
  • Results question if models truly understand long texts or just recall training data during summarization.

Why It Matters

This challenges the core value proposition of expensive, long-context AI models for professional analysis and content creation tasks.