Developer Tools

Major news outlets block Internet Archive to stop AI data scraping

Hacker News February 15, 2026

⚡The fight over AI training data just claimed a major casualty...

Deep Dive

News publishers like The Guardian and The New York Times are restricting the Internet Archive's access to their content, fearing its trillion-webpage repository is a backdoor for AI companies to scrape training data. The Guardian has blocked its articles from the Archive's APIs and Wayback Machine. This reflects a broader trend where publishers, including the Financial Times, are treating archival bots with the same suspicion as AI crawlers from OpenAI and Anthropic.

Why It Matters

This threatens a cornerstone of web preservation and signals a new front in the battle over who controls online information.

Read Original Article

Major news outlets block Internet Archive to stop AI data scraping

Why It Matters

Related Articles

🚀 Stay Ahead in AI