Nearly 400 Local Newspapers Sue Microsoft and OpenAI Over AI Training Copyright Violations

Nearly 400 local and regional newspapers banded together on Monday to sue Microsoft and OpenAI, alleging the tech giants pilfered their copyrighted articles to train generative AI models like ChatGPT and Copilot. The lawsuit, filed in the U.S. District Court for the Southern District of New York, marks one of the largest collective actions by small and mid-sized publishers against AI developers, intensifying a legal firestorm over the use of news content in artificial intelligence systems.

The plaintiffs, a coalition of publishers who own titles from coast to coast, claim that Microsoft and OpenAI scraped and reproduced their articles without permission, credit, or compensation. The complaint accuses the companies of systematically harvesting protected journalistic work to build and refine large language models (LLMs), then deploying those models in commercial products that directly compete with traditional news outlets.

“This is about protecting the lifeblood of local journalism,” said an attorney for the publishers in a statement. “These are not faceless corporations—they are community newspapers that rely on subscriptions and advertising. When their content is ingested and repurposed by AI without any payment, it threatens their very survival.”

The suit seeks unspecified monetary damages and a court order to halt the alleged infringement, as well as the destruction of AI models trained on the publishers’ content. It underscores a deepening rift between the technology industry and the news business, which has seen a cascade of similar lawsuits since the debut of ChatGPT in late 2022.

A Growing Legal Storm

This action arrives as courts in the United States and Europe grapple with the copyright implications of generative AI. In late 2024, The New York Times sued OpenAI and Microsoft, alleging millions of its articles were used in training. Other major outlets, including Getty Images and a group of novelists, have also filed claims. But this new complaint represents a significant escalation: it concentrates the grievances of hundreds of smaller publications that often lack the legal resources to fight alone.

“Consolidation among local newspapers has created a few parent companies, and now they’re acting as a unified front,” explained Sarah Kendzior, a media law analyst at the University of Missouri. “This isn’t just about big-city dailies. It’s about the mom-and-pop papers that are already teetering on the edge.”

The complaint meticulously lists each publication, with many titles dating back a century or more. It argues that the defendants’ AI chatbots have been shown to regurgitate near-verbatim excerpts from copyrighted stories, potentially siphoning readers away from the original sources. In one cited example, a query about a local school board decision returned an answer that closely tracked the wording of a subscriber-only article from a Pennsylvania weekly.

The Core Legal Arguments

The publishers’ case rests on straightforward copyright claims. They assert that Microsoft and OpenAI infringed their exclusive rights to reproduce, distribute, and display their works. The complaint also raises concerns under the Digital Millennium Copyright Act (DMCA) for allegedly removing copyright management information from articles during the scraping process.

Crucially, the plaintiffs challenge the notion that AI training falls under “fair use.” That doctrine allows limited copying for purposes like criticism, news reporting, or research, but it weighs four factors: the purpose of the use, the nature of the copyrighted work, the amount used, and the effect on the market. The publishers contend that all four tilt in their favor.

“Commercial exploitation weighs heavily against fair use,” the complaint states. “The defendants are not just researching—they are building multibillion-dollar products that undercut the market for the very news they copy.” They note that Microsoft’s Copilot and OpenAI’s ChatGPT are now integrated into operating systems and productivity tools, making them direct substitutes for reading the original stories.

Microsoft and OpenAI’s Likely Defense

Neither company has yet filed a response, but their legal playbook in ongoing cases offers clues. Both have insisted that training AI on publicly available internet data constitutes a transformative fair use because the model learns patterns, not specific expression, and the output is generally not a direct copy. OpenAI has previously argued that its models do not store articles but rather generate novel text based on probabilistic predictions.

Microsoft, which has invested billions in OpenAI and uses its technology in Copilot, has taken a more aggressive stance. In a motion to dismiss the New York Times lawsuit, Microsoft likened AI training to “a student learning from a textbook” and warned that a ruling for plaintiffs could stifle innovation. The company also emphasized that users, not the AI itself, are responsible for any copied output.

Yet the local newspaper case may be harder to dismiss on fair use grounds. Unlike the vast corpus of internet text, the works here are clearly defined, professionally produced, and often behind paywalls. The plaintiffs allege that the defendants “bypassed technical protections” on subscription-only content, which could implicate anti-circumvention provisions of the DMCA. Moreover, the complaint details specific instances where Copilot and ChatGPT outputs closely resembled the plaintiffs’ articles, undermining the claim that the models only generate original text.

“If the evidence shows literal or near-literal copying of protected expression, that’s a problem for the defense,” said James Grimmelmann, a professor of internet law at Cornell University. “Fair use typically doesn’t protect verbatim reproduction for commercial purposes.”

The Plight of Local Journalism

Behind the legal jargon is a stark economic reality. Local newspapers have been hemorrhaging jobs and closing at an alarming rate. Between 2005 and 2025, the United States lost more than a quarter of its newspapers, with many communities now classified as “news deserts.” Advertising revenue, once the backbone of the industry, has migrated to tech platforms.

The rise of AI has compounded the crisis. When search engines and chatbots answer queries by summarizing news, they reduce traffic to original sites. A 2025 study by the News Media Alliance estimated that AI-generated news summaries cost publishers up to $2 billion in annual ad revenue.

“We’re not Luddites,” said a publisher involved in the suit. “We use technology, too. But there has to be a line. If you train your machine on our sweat and then sell it back to consumers as a replacement for our product, that’s not innovation—it’s theft.”

Some news organizations have chosen to negotiate licensing deals rather than sue. The Associated Press, Axel Springer, and a handful of large newspaper chains have struck agreements with OpenAI. But for hundreds of smaller outlets, such deals remain out of reach. The lawsuit could serve as a pressure tactic to force Microsoft and OpenAI to the bargaining table.

Broader Implications for the AI Industry

The case, if it proceeds to a verdict, could redefine the boundaries of AI training. A win for the publishers might compel AI developers to license data more broadly, mirroring the music industry’s transition to streaming models. It could also spur Congress to step in with clearer rules.

Conversely, a victory for Microsoft and OpenAI would embolden the industry to continue scraping public web data with minimal restrictions. The U.S. Supreme Court’s 2023 ruling in Andy Warhol Foundation v. Goldsmith, which narrowed fair use in transformative works, might not directly apply, but it signals judicial skepticism toward overly broad fair use claims.

International developments add pressure. The European Union’s AI Act requires transparency about training data, and some member states are exploring copyright levies. In Japan and Israel, governments have taken a more permissive stance, characterizing AI training as permissible under existing law. A fragmented global landscape complicates compliance for companies like Microsoft and OpenAI.

Investors are watching closely. Any ruling that imposes heavy licensing costs could dent the profitability of generative AI. Meanwhile, news executives hope the lawsuit will provide a lifeline.

What’s Next

The case is still in its earliest stages. The defendants have not yet been served, and a schedule for motions has not been set. Given the complexity and the stakes, litigation could stretch for years. In the interim, other publishers may join the suit or file their own.

Legal observers expect the defense to file a motion to dismiss, arguing that the complaint fails to state a claim under copyright law. The court’s ruling on that motion will be a critical early test. If the case survives, discovery could unearth internal documents about how Microsoft and OpenAI collected and processed training data—potentially reshaping public understanding of AI development.

For now, the message from Main Street newsrooms is clear: the future of local journalism may depend on whether the law can catch up with Silicon Valley’s breakneck pace. “We’re not asking for a bailout,” said one editor. “We’re asking for basic respect for the law that’s supposed to protect our work.”