Developer Tools

I prompted ChatGPT, Claude, Perplexity, and Gemini and watched my Nginx logs

A technical experiment shows AI models directly fetch websites with unique bots, not just from cached indexes.

Deep Dive

A developer conducted a hands-on experiment to answer a key question in web analytics: when AI models like ChatGPT, Claude, and Perplexity cite a website, do they fetch the page live or pull from a pre-built index? By setting up a custom Nginx log format and prompting each AI with questions designed to trigger citations for a domain they controlled, they captured definitive server-side evidence.

The logs revealed clear 'provider-side fetch' patterns. ChatGPT's requests arrived with the 'ChatGPT-User/1.0' user-agent, showed no referrer, and came in tight bursts from multiple IPs as the model evaluated candidate pages. Claude's behavior was similar with 'Claude-User/1.0' and notably fetched `/robots.txt` first. Perplexity also performed direct fetches using 'Perplexity-User/1.0'. This data proves these models can and do retrieve content directly from origin servers in real-time.

The experiment highlights a critical distinction often blurred in marketing: 'AI traffic' can mean the model itself fetching data (a provider-side action) or a human user clicking a citation link (a real visit). These are different events with different implications for measurement and attribution. The findings provide concrete, technical validation of how these AI agents operate, moving beyond vendor claims to observable server logs.

Key Points
  • ChatGPT fetches pages using 'ChatGPT-User/1.0' in multi-IP bursts with no referrer.
  • Claude uses 'Claude-User/1.0', checks robots.txt first, and follows redirects normally.
  • Perplexity was observed using 'Perplexity-User/1.0' for direct fetches, though it may also use an index.

Why It Matters

For professionals, this clarifies how to accurately track and attribute web traffic originating from AI models versus human users.