Models & Releases

The dictionaries are suing OpenAI for "massive" copyright infringement, and say ChatGPT is starving publishers of revenue

Dictionary giants allege ChatGPT 'starves' them of web traffic and ad revenue by absorbing their content.

Deep Dive

Two of the world's most authoritative dictionary publishers, Britannica and Merriam-Webster, have launched a major copyright infringement lawsuit against OpenAI. Filed in the Southern District of New York, the complaint accuses the AI giant of building its $730 billion valuation on the back of their meticulously researched and fact-checked content. The publishers argue that OpenAI fed their LLMs (large language models) with the work of hundreds of human writers and editors without permission or payment, constituting 'massive' copyright infringement.

The core of the lawsuit alleges a direct threat to the publishers' business model. Where traditional search engines like Google send users to publisher websites—generating crucial ad revenue—ChatGPT is accused of 'cannibalizing' that traffic. The AI absorbs the publishers' content and delivers polished answers directly, which the complaint states 'starves web publishers... of revenue.' This case is the latest in a series of high-profile lawsuits from media companies, authors, and artists, all challenging the foundational practice of training AI models on publicly available web data.

The legal battle raises profound questions about the boundaries of 'fair use' in the AI era and what constitutes public knowledge versus proprietary information. The outcome could force AI companies to negotiate licensing deals for vast swaths of training data or significantly alter how they develop future models like GPT-5. For publishers, it's a fight for survival, arguing that the very systems profiting from their work are undermining their ability to fund the creation of that reliable information in the first place.

Key Points
  • Britannica and Merriam-Webster allege OpenAI trained ChatGPT on their copyrighted content without permission, calling it 'massive' infringement.
  • The lawsuit claims ChatGPT 'cannibalizes' publisher revenue by providing answers directly instead of driving traffic to their ad-supported websites.
  • This case is part of a growing legal trend challenging the 'fair use' doctrine as applied to AI training on web-scraped data.

Why It Matters

The case could redefine copyright law for AI, potentially forcing tech giants to pay publishers for training data and altering how models are built.