After Anthropic accused Chinese labs of scraping Claude, someone open-sourced 155K of their own Claude conversations — and built a tool for everyone to do the same
Tool released to scrape AI conversations in protest of Anthropic's 'pulling up the ladder' data policies.
In a dramatic escalation of the AI data wars, an anonymous developer has open-sourced DataClaw—a tool specifically designed to scrape conversation data from AI assistants like Anthropic's Claude. The release comes with a cache of 155,000 previously private Claude conversations and a manifesto accusing Anthropic of 'pulling up the ladder' by implementing strict data policies after building their models on freely available information. The tool gained immediate traction with 363 GitHub stars in 24 hours and even caught Elon Musk's attention, who simply replied 'Cool.' This move directly challenges the growing trend of AI companies restricting access to the conversational data that could be used to train competing models.
The DataClaw release represents a significant protest against what many in the open-source AI community see as hypocrisy. While Anthropic recently accused Chinese labs of scraping Claude's outputs for training data, critics argue that Anthropic itself trained Claude on vast amounts of publicly available internet data. The tool's README explicitly frames this as an act of digital civil disobedience: 'Anthropic built their models with freely shared information, then pushed increasingly strict data policies to stop others from doing the same. DataClaw throws the ladder back.' This development comes amid other controversies, including instances of Claude outputs being submitted as other models like 'DeepSeek-V3' in Chinese benchmarks, further complicating the data provenance landscape. The release signals growing tension between proprietary AI companies and the open-source community over who controls the data needed to build competitive AI systems.
- DataClaw tool open-sources 155,000 private Claude conversations for model training
- Gained 363 GitHub stars in 24 hours with Elon Musk commenting 'Cool' on the release
- Direct response to Anthropic accusing Chinese labs of scraping Claude data while restricting access
Why It Matters
Escalates the AI data wars, challenging proprietary control over training data essential for building competitive models.