Research & Papers

LLMs fail to feel beauty: New study reveals aesthetic alignment gap

AI can rate beauty but can't feel it—interoceptive gap revealed.

Deep Dive

Researchers Yoshia Abe, Tatsuya Daikoku, and Yasuo Kuniyoshi posted a study on arXiv (2605.18759) examining how well LLMs align with humans in aesthetic experiences. They gave both human participants and state-of-the-art LLMs a set of questionnaire items about beauty ratings, emotions, and bodily sensations for visual scenes. The analysis revealed that LLMs broadly matched human averages in correlating beauty with emotions and in the image features they prioritized. However, significant divergences emerged in the distribution of emotional responses and especially in the link between beauty ratings and bodily sensations (interoceptive divergence).

This suggests that while large-scale text training helps LLMs approximate average human aesthetics, they lack the embodied, interoceptive grounding that colors human aesthetic judgment. The authors argue these gaps may stem from insufficient representation of bodily sensations in training data or unintended side effects of alignment processes. For AI alignment research, the findings underscore that achieving human-like sensibility requires more than just mimicking cognitive outputs—it must account for the visceral, bodily dimensions of experience. As AI enters creative and aesthetic domains, bridging this gap becomes essential for truly aligned human-AI interaction.

Key Points
  • LLMs matched humans in beauty-emotion correlations but diverged in bodily sensation responses.
  • Study used 20-page paper with 9 figures comparing human and AI questionnaire results on aesthetic evaluation.
  • Interoceptive gap suggests text-trained LLMs lack embodied sensibility, a key challenge for AI alignment.

Why It Matters

As AI integrates into creative fields, bridging the interoceptive gap is critical for human-like aesthetic interaction.