Sycamore study reveals limits of AI personas for genomics visualization evaluation
LLM personas miss expert preferences, even when grounded in real user data.
Huyen Nguyen, Astrid van den Brandt, and Nils Gehlenborg (Eindhoven University of Technology and Harvard Medical School) present Sycamore, a three-condition study evaluating how well LLM-based synthetic personas can assess a genomics visualization retrieval system called Geranium. The first condition uses ungrounded personas derived from generic LLM priors; the second grounds personas in voice-of-customer artifacts from a prior interview study with domain experts; the third is a published baseline of real expert evaluations.
Results show that grounding synthetic personas shifts their feedback toward the language and concerns of documented users—e.g., more focus on data type and annotation clarity. However, ungrounded personas drift toward operational specifics (like query syntax) that real participants never raised. Critically, both synthetic conditions converge on a 'find-and-adapt' framing and completely miss the image-modality preference that experts consistently demonstrated. The authors argue that synthetic personas are not substitutes but rather exploratory probes best used alongside expert studies, especially for narrowing evaluation scope before costly human testing.
- Grounded synthetic personas (using interview data) produced feedback closer to real experts' language than ungrounded ones.
- Ungrounded personas over-indexed on operational details (e.g., query syntax) that real domain experts did not mention.
- Both synthetic conditions failed to capture experts' preference for image-modality search, a key finding from the baseline study.
Why It Matters
Suggests LLM personas can help scope evaluations but cannot replace expert insights in niche scientific domains.