The dictionaries are suing OpenAI for "massive" copyright infringement, and say ChatGPT is starving publishers of revenue
Major dictionaries allege ChatGPT is 'starving' them of revenue by absorbing their fact-checked content.
Two of the world's most authoritative reference publishers, Britannica and Merriam-Webster, have launched a significant legal challenge against OpenAI. In a lawsuit filed in the Southern District of New York, they accuse the AI giant of "massive" copyright infringement, alleging that OpenAI trained its models, including the technology behind ChatGPT, on their meticulously researched and fact-checked content. The complaint argues that OpenAI built its estimated $730 billion valuation on the backs of their human writers and editors, using this proprietary material without permission, license, or payment.
The core of the publishers' grievance is economic. They claim ChatGPT's design fundamentally undermines their business model. Where traditional search engines drive traffic—and thus advertising revenue—to their websites, ChatGPT "absorbs" their content and delivers polished answers directly to users, effectively "starving" the publishers of income. The lawsuit frames this not just as copyright theft but as a direct threat to their survival, stating the AI "cannibalizes" the web traffic they depend on.
This case represents the latest and one of the most symbolically potent lawsuits in a growing wave of litigation against AI companies over their training data. It moves beyond fiction and news to target foundational reference works, raising profound questions about what constitutes "public knowledge" and what information should be off-limits for commercial AI training. The outcome could force a major reckoning on how LLMs are developed and establish new precedents for compensating creators of factual, non-fiction content.
- Britannica and Merriam-Webster filed suit in NY, alleging OpenAI trained ChatGPT on their copyrighted content without permission.
- The publishers claim ChatGPT's direct-answer model 'cannibalizes' their web traffic and ad revenue, threatening their business survival.
- This case is part of a broader legal battle defining what data is fair game for training commercial AI models.
Why It Matters
The case challenges the foundational data practices of the AI industry and could redefine compensation for factual content creators.