Research & Papers

Causality $\neq$ Invariance: Function and Concept Vectors in LLMs

New study reveals LLMs contain abstract concept representations distinct from those driving in-context learning performance.

Deep Dive

A team of researchers including Gustaw Opiełka, Hannes Rosenbusch, and Claire Stevenson has published a groundbreaking paper titled 'Causality ≠ Invariance: Function and Concept Vectors in LLMs' (arXiv:2602.22424), accepted at ICLR 2026. The study fundamentally challenges assumptions about how large language models represent knowledge, demonstrating that the representations that causally drive task performance during in-context learning (Function Vectors) are not the same as those that encode abstract concepts. Through systematic experiments across multiple LLMs, the researchers found that Function Vectors extracted from different input formats (like open-ended vs. multiple-choice questions) targeting the same concept were nearly orthogonal, revealing these causal drivers are surprisingly format-specific rather than abstract.

The researchers identified a new type of representation called Concept Vectors, composed of attention head outputs selected using Representational Similarity Analysis based on their consistency across input formats. While these Concept Vector heads emerge in similar layers to Function Vector heads, the two sets are largely distinct, suggesting different underlying neural mechanisms. Crucially, steering experiments show that while Function Vectors excel in-distribution when extraction and application formats match, Concept Vectors demonstrate superior out-of-distribution generalization across both question types and languages. This discovery has significant implications for improving model robustness, enabling better cross-format and cross-lingual transfer, and provides a more nuanced understanding of how LLMs separate causal task execution from abstract concept storage.

Key Points
  • Function Vectors (FVs) that drive in-context learning are format-specific, becoming nearly orthogonal when extracted from different formats targeting the same concept
  • Newly identified Concept Vectors (CVs) maintain stable representations across formats and generalize better out-of-distribution across question types and languages
  • CVs and FVs use largely distinct sets of attention heads despite emerging in similar layers, revealing separate mechanisms for causal execution vs. concept storage

Why It Matters

This research enables more robust AI systems that generalize better across formats and languages, moving beyond brittle in-context learning.