Developer Tools

Major news outlets block Internet Archive to stop AI data scraping

The fight over AI training data just claimed a major casualty...

Deep Dive

News publishers like The Guardian and The New York Times are restricting the Internet Archive's access to their content, fearing its trillion-webpage repository is a backdoor for AI companies to scrape training data. The Guardian has blocked its articles from the Archive's APIs and Wayback Machine. This reflects a broader trend where publishers, including the Financial Times, are treating archival bots with the same suspicion as AI crawlers from OpenAI and Anthropic.

Why It Matters

This threatens a cornerstone of web preservation and signals a new front in the battle over who controls online information.

📬 Get the top 10 AI stories daily