Research & Papers

Humans More Biased Than LLMs by Source Labels, Study Finds

505 participants and four LLMs tested on logical fallacy judgments reveal a stark human vulnerability.

Deep Dive

Researchers Mahjabin Nahar and colleagues conducted an online study with 505 participants to examine how source labels bias reasoning judgments. Participants evaluated comments containing logical fallacies under five source conditions: human, AI, human with AI assistance, AI with human assistance, or no disclosure. The same comments were then evaluated by three leading LLMs—GPT-5.2, Gemini 2.5 Flash, and Claude Sonnet 4.5—under identical source label conditions. Results revealed a stark asymmetry: human evaluators were significantly more likely to rate fallacious arguments as trustworthy when labeled as human-written or human with AI assistance. In contrast, LLM evaluations remained comparatively stable across all source labels, though performance varied by model. Confidence levels were similarly high across conditions for both humans and LLMs, regardless of whether a fallacy was present.

The findings have direct implications for human-AI collaboration in content moderation and evaluation. As AI-generated and AI-assisted content floods online spaces, source labels attached to such content can distort human reasoning, leading to biased judgments. This study suggests that source-label bias is primarily a human vulnerability—LLMs offer more source-agnostic evaluation. However, the researchers note that LLM performance varied across models, indicating that not all LLMs are equally robust. The work highlights the potential of using LLMs as unbiased evaluators in AI-mediated environments, while also underscoring the need for interventions to mitigate human susceptibility to source cues. As platforms increasingly rely on human-in-the-loop moderation, understanding and addressing this bias becomes critical for maintaining objective decision-making.

Key Points
  • 505 participants evaluated logical fallacies under five source conditions (human, AI, human+AI, AI+human, no disclosure).
  • Humans assigned higher trust and evaluation ratings to fallacies labeled as human or human+AI, while LLMs stayed stable across labels.
  • LLMs tested: GPT-5.2, Gemini 2.5 Flash, Claude Sonnet 4.5—all showed source-agnostic evaluation but with model-specific performance variations.

Why It Matters

As AI content floods platforms, human vulnerability to source labels threatens objective moderation—LLMs may offer more unbiased assessment.