AI Safety

Axes of Planning in LLMs + Partial Lit Review

New framework helps researchers understand if LLMs are truly planning ahead

Deep Dive

The question of whether large language models 'plan' is central to interpretability and AI safety, but the term itself is slippery. In a new post on LessWrong, researcher NickyP tackles this by first listing 12 concrete behaviors often associated with planning—from a model noticing future-useful facts mid-generation, to executing conditional branching based on an internal outline. These examples range from simple next-token adjustments to full hierarchical planning, highlighting that 'planning' covers a spectrum of cognitive operations rather than a single binary capability.

To formalize this, NickyP proposes four axes: Time Horizon (next token vs. whole output), Vague vs. Specific (vague theme vs. detailed subgoals), Option Space (multiple explicit alternatives vs. single narrow path), and Forward Dependency/Constraints (shallow one-off choices vs. deeply nested dependencies). The author also gives a partial literature review, noting that most existing work focuses on one or two axes at most. This framework doesn't provide definitive answers but equips researchers with a common vocabulary to design more targeted experiments—critical for determining when models are genuinely reasoning ahead versus just pattern-matching fluently.

Key Points
  • Four axes: time horizon (next token to whole output), vagueness (vague theme vs. detailed subgoals), option space (multiple explicit options vs. narrow path), forward dependency (shallow vs. nested constraints)
  • Author lists 12 concrete examples of planning-related behaviors, from noticing future-useful facts to executing conditional branching based on internal outlines
  • Epistemic status: preliminary taxonomy written over a couple days, some newer papers excluded; serves as a starting point for structured discussion

Why It Matters

Gives interpretability researchers a concrete language to evaluate whether LLMs truly plan—key for understanding model reliability and alignment.