TUX measures how well LLMs grasp human intuition without explicit cues
A new benchmark reveals if AI truly 'gets' your vague, subjective judgments.
A new paper from UIUC researchers introduces TUX (Tacit Understanding Index), a metric to quantify how well large language models align with human intuition on open-ended, subjective tasks. Inspired by the party game Wavelength, the study asked 241 human participants and 200 profile-conditioned LLM agents (spanning four models) to place concepts along a continuous spectrum—e.g., rating how “hot” or “cold” a term feels. The key innovation: TUX measures similarity between human and agent judgments without explicit objectives or feedback, capturing a form of tacit understanding that standard accuracy benchmarks miss.
The results show that tacit alignment is structured by person-level characteristics: the closest human-agent pairs in trait space achieved significantly higher TUX scores. Regression analyses revealed that TUX becomes more explainable as predictor sets grow richer—individual traits, decision-making styles, and confidence levels outperformed simple aggregate trait-distance baselines. The findings suggest that while profile-based conditioning can nudge LLMs toward certain human-like responses, it has limits in capturing deeper representational alignment. This opens the door for more personalized AI collaborators that adapt to an individual’s implicit evaluation style.
- TUX is evaluated on 241 humans and 200 LLM agents using a Wavelength-style spectrum-placement task
- Nearest human-agent pairs in trait space achieve significantly higher tacit understanding than random matches
- Richer trait profiles (demographics, decision styles, confidence) improve predictability of TUX by 15-20% over baseline
Why It Matters
This benchmark could enable AI assistants that intuitively adapt to your personal judgment style without explicit instruction.