Shared Lexical Task Representations Explain Behavioral Variability In LLMs
New study finds 'lexical task heads' explain prompt sensitivity in LLMs.
A team of researchers from Brown University, including Zhuonan Yang, Jacob Xiaochen Li, Francisco Piedrahita Velez, Eric Todd, David Bau, Michael L. Littman, Stephen H. Bach, and Ellie Pavlick, have published a paper on arXiv titled 'Shared Lexical Task Representations Explain Behavioral Variability In LLMs.' The study investigates why LLMs exhibit prompt sensitivity—unpredictable performance changes based on how a question is posed. By comparing instruction-based prompts (describing the task in natural language) and example-based prompts (using in-context few-shot demonstrations), the researchers found that despite large performance variations, the models engage common underlying mechanisms across different prompts.
The key discovery is the identification of 'lexical task heads,' specific attention heads whose outputs literally describe the task. These heads are shared across prompting styles and trigger subsequent answer production. The study shows that behavioral variation between prompts can be explained by the degree to which these heads are activated. Failures occur at least sometimes due to competing task representations that dilute the signal of the target task. These results provide a clearer picture of how LLMs' internal representations can explain behavior that otherwise seems idiosyncratic to users and developers.
- Researchers identified 'lexical task heads'—attention heads that encode task representations shared across prompting styles.
- Prompt sensitivity is explained by the degree of activation of these heads, not by random variation.
- Failures occur when competing task representations dilute the target task's signal, offering a mechanistic explanation for LLM behavior.
Why It Matters
This research demystifies LLM prompt sensitivity, enabling more reliable AI systems and better prompt engineering.