AnnotateThis boosts LLM annotation accuracy by 23% for nuanced social concepts
Human-in-the-loop system slashes errors in climate change sentiment analysis by nearly a quarter.
Large language models are increasingly used for data annotation in computational social science, but they often fail on nuanced, domain-specific concepts. To address this, Zexuan Li and colleagues from the University of Michigan developed AnnotateThis, a system that places humans at the center of LLM-supported annotation workflows. Designed for both computational and social scientists, AnnotateThis provides information features that let users interrogate the quality and reliability of LLM annotations—essentially “grounding” the model to a specific target concept. In one evaluation setting, researchers had no ground truth and limited prior concept knowledge; in another, they had ground truth labels. In both cases, human users significantly improved annotation quality.
When ground truth was available, AnnotateThis outperformed a state-of-the-art automated prompt refinement method by 0.15 in F-Measure and 0.23 in accuracy—a massive leap for tasks like detecting climate change mitigation pessimism on social media. The system proves that even with advanced AI, human judgment remains critical for capturing subtle, context-dependent meanings. This work opens the door to more reliable AI-assisted research in domains from political discourse to public health, where getting the annotation right is essential for downstream analysis.
- AnnotateThis integrates human oversight into LLM annotation, enabling iterative refinement of outputs for nuanced concepts like climate change mitigation pessimism.
- Achieved a 0.15 improvement in F-Measure and 0.23 improvement in accuracy over fully automated prompt refinement in ground truth evaluations.
- System supports both researchers with and without access to ground truth labels, making it versatile for real-world computational social science workflows.
Why It Matters
Human-AI collaboration outperforms fully automated LLMs for complex, context-dependent annotation tasks in social science.