Research & Papers

Semantic Structure of Feature Space in Large Language Models

New research maps 360 words onto 32 semantic axes, matching human ratings.

Deep Dive

A new paper from Austin C. Kozlowski and Andrei Boutyline investigates how large language models (LLMs) organize semantic concepts internally. They extracted feature vectors from hidden states for 360 words and projected them onto 32 bipolar semantic axes (e.g., beautiful-ugly, soft-hard). These projections were then compared to human survey ratings of the same words on those scales, and the correlations were remarkably high.

Further, the cosine similarities between the semantic axes themselves accurately predicted the correlations between those scales in human surveys. The 32 axes formed a low-dimensional subspace, mirroring typical human semantic associations. Most strikingly, steering a word along one semantic axis (e.g., making it more beautiful) caused proportional spillover effects on other axes (e.g., also shifting softness), scaled by the cosine similarity between those axes. This suggests that LLM features should be understood through their geometric relations and subspaces, not just individually.

Key Points
  • Feature vectors for 360 words projected onto 32 semantic axes correlate with human ratings.
  • Cosine similarities between axes predict inter-scale correlations in human surveys.
  • Steering one semantic axis causes spillover effects proportionate to cosine similarity with other axes.

Why It Matters

Reveals LLMs encode human-like semantic geography, aiding interpretability and controlled generation.