News publishers limit Internet Archive access due to AI scraping concerns
The fight over AI training data just claimed a major casualty...
Deep Dive
News publishers like The Guardian and The New York Times are restricting the Internet Archive's access to their content, fearing its trillion-webpage repository is a backdoor for AI companies to scrape training data. The Guardian has blocked its articles from the Archive's APIs and Wayback Machine. This reflects a broader trend where publishers, including the Financial Times, are treating archival bots with the same suspicion as AI crawlers from OpenAI and Anthropic.
Why It Matters
This threatens a cornerstone of web preservation and signals a new front in the battle over who controls online information.