Local Journalism Outlets Hit OpenAI, Microsoft with Massive DMCA Lawsuit Over AI Training

A coalition representing nearly 400 local and regional newspapers filed a federal lawsuit on June 24, 2026, in the U.S. District Court for the Southern District of New York, accusing OpenAI and Microsoft of using copyrighted reporting to train their AI models without permission and of stripping out copyright management information in violation of the Digital Millennium Copyright Act. The plaintiffs, including daily papers from every corner of the country, allege that ChatGPT, Microsoft Copilot, and other AI products have ingested decades of original journalism, effectively republishing it without compensation or attribution and eroding the economic foundation of community newsrooms.

The suit names specific training datasets and model outputs that it claims reproduce verbatim or near-verbatim excerpts from stories published on websites owned by the newspapers. Attorneys for the publishers argue that by copying and processing these works into their large language models, OpenAI and Microsoft have engaged in mass infringement on a scale that dwarfs earlier digital piracy disputes. The filing also accuses the companies of deliberately removing bylines, publication dates, and other copyright management information (CMI) to mask the origin of the content, a move the plaintiffs say amounts to knowing circumvention of the DMCA.

The Scope of the Alleged Infringement

Statutory damages for each act of DMCA violation can reach up to $25,000, but the complaint reserves the right to seek enhanced damages for willful infringement—potentially pushing the liability into the billions. The plaintiffs cite thousands of examples in which ChatGPT or Copilot allegedly rehashed investigative features, editorial analysis, and breaking-news reports that originally appeared on their sites. One exhibit highlights a three-paragraph summary of a city council corruption investigation that lifted entire sentences from a small-town Illinois paper, down to the reporter’s distinctive phrasing.

OpenAI and Microsoft have long maintained that training on publicly available internet data constitutes fair use, a defense they are expected to raise forcefully in this case. However, legal experts note that the DMCA claim—focused on the removal of CMI—bypasses some of the fair-use analysis, creating a separate liability pathway that federal courts have yet to fully test against generative AI companies. “If the court finds that the models were designed to strip author identities and copyright notices, the DMCA claim could survive even if the underlying copying is eventually deemed fair use,” said Professor Elena Vasquez of Columbia Law School, who has tracked AI litigation since 2024.

How the AI Models Benefit from News Content

The complaint details how copyrighted news content enhances the accuracy, timeliness, and trustworthiness of AI outputs, turning local journalism into a free raw material for commercial products. By incorporating fact-checked reporting into training data, the models gain the ability to answer current-events questions, summarize local politics, and even mimic the narrative style of veteran correspondents—all while diverting traffic away from the original sources.

For Microsoft, the integration of Copilot into Windows 11 and the Edge browser has made news-summarization a daily habit for millions of users. When a Windows user asks Copilot about a school-board election, the assistant can synthesize information that originally required a reporter’s legwork, often without ever directing the user to the newspaper’s site. Internally, Microsoft documentation cited in the suit allegedly acknowledges that news content is a “high-value signal” for training conversational AI, and the company reportedly budgeted for licensing deals with major publishers—but the local papers say they were never approached.

The DMCA Angle: Stripping Copyright Management Information

Section 1202 of the DMCA prohibits the intentional removal or alteration of CMI, including author names, terms of use, and copyright notices, when doing so facilitates infringement. According to the lawsuit, the AI training pipeline scrubs these identifiers from articles before feeding them into neural networks. The plaintiffs point to a technical analysis showing that when a model generates a passage nearly identical to an original story, the output lacks any attribution line, effectively presenting the plagiarized text as fresh, AI-generated insight.

The complaint also alleges that Microsoft’s Bing indexing system, which underpins Copilot’s retrieval-augmented generation, routinely strips CMI from cached copies. Because Copilot can surface real-time snippets through this index, the newspapers argue that the removal is ongoing and damages continue to accrue daily. The lawsuit seeks an injunction that would force the companies to either retrain models without the newspapers’ content or implement a consent-and-compensation framework before further use.

Microsoft Copilot’s Deep Integration in Windows Raises the Stakes

Windows news readers will feel the reverberations of this case directly. Since the launch of Copilot as a native system assistant in Windows 11, the feature has become a default search and summarization tool for millions of desktop users. If a court orders changes to how Copilot handles news data, Microsoft could be forced to rework its indexing, prompting, and retrieval systems—a costly and time-consuming process that might degrade the assistant’s real-time information capabilities.

Beyond the technical disruption, the litigation spotlights Windows’ role as a gateway for AI-powered content consumption. Local newspapers contend that Copilot has transformed the OS into a “loss leader for journalism,” turning every PC into a newspaper substitute that profits Microsoft through increased user engagement and advertising while strangling the very newsrooms that inform those queries. For Windows enthusiasts, the case raises uncomfortable questions about what their favorite operating system’s AI backbone is actually drawing from and whether their reliance on Copilot inadvertently contributes to the hollowing out of community media.

Parallels to Other High-Profile AI Copyright Cases

The lawsuit builds on a crescendo of legal challenges against AI companies. In late 2023, The New York Times sued OpenAI and Microsoft on similar grounds, a case that is still winding through discovery. What distinguishes the new filing is its sheer volume of plaintiffs and its focus on DMCA claims rather than pure copyright infringement. Smaller publishers have banded together, mirroring the class-action strategy seen in other tech disputes, to pool resources and present a unified narrative of systemic harm.

Earlier this year, a federal judge in California allowed a DMCA-based claim to proceed against Stability AI, ruling that the removal of CMI from training images met the threshold for a cause of action. Legal observers say that ruling may have encouraged the newspaper coalition to pursue the same path. If the New York court similarly allows the DMCA counts to move forward, a sweeping discovery process could expose internal emails, training documentation, and engineering decisions that reveal exactly how OpenAI and Microsoft treat copyrighted text.

Local Journalism’s Struggle for Survival in the AI Era

The newspapers’ argument goes beyond legal technicalities to paint a stark picture of an industry on life support. Since 2005, more than 2,500 local news outlets have closed in the United States, and the proliferation of AI tools that summarize and remix original reporting threatens to accelerate that decline. “Every time Copilot answers a local news query with our work, we lose a website visit, a subscription, an ad impression—and when we close, there’s no one left to cover the school board or the zoning commission,” said Debra Callahan, publisher of the Daily Quill in rural Ohio, in a statement attached to the lawsuit.

The coalition includes papers from the Midwest’s heartland, the coastal media hubs, and the shrinking news deserts of Appalachia and the Plains. Some are family-owned operations with fewer than ten employees. Their combined reach, however, is significant—collectively, they serve over 40 million readers in print and online. The suit asks the court to recognize that AI training without compensation constitutes not just a copyright violation but an existential threat to democracy’s information infrastructure.

Potential Outcomes and What This Means for Windows Users

If the plaintiffs prevail, the most immediate impact for Windows users could be a dimming of Copilot’s local-news summarization. Microsoft might be forced to filter out content from thousands of outlets, leaving Copilot unable to answer hyper-local questions or relying on licensed, curated sources that could lag behind breaking news. An injunction might also require Microsoft to offer users a way to see which sources Copilot pulled from, adding a layer of transparency but also complexity to the clean, conversational interface Windows users enjoy today.

A settlement is another possibility, and many industry watchers believe that both OpenAI and Microsoft will ultimately seek licensing agreements rather than risk a damaging court precedent. If such deals become the norm, the technology giants could pass costs onto enterprise customers, or Windows could see new subscription tiers that bundle premium Copilot features with licensed news access. Either way, the era of free-and-clear AI scraping of news content appears to be drawing to a close.

Industry Reaction and the Road Ahead

Reaction from the tech and media sectors has split along predictable lines. Open-source AI advocates argue that requiring data licensing will entrench Big Tech dominance, as only the wealthiest companies can afford to pay for training material. Journalism nonprofits, meanwhile, have hailed the lawsuit as a necessary correction. The Electronic Frontier Foundation, often skeptical of copyright overreach, has not yet taken a formal position but acknowledged that the DMCA claim raises novel questions about how AI models handle metadata.

For Windows users and the broader public, the case highlights a tension that will define the next decade of computing: as AI becomes embedded in operating systems and browsers, the line between a helpful assistant and a content parasite blurs. The June 24 filing makes clear that hundreds of local publishers believe that line has been crossed, and they are staking their future on a federal judge agreeing with them. The outcome will ripple far beyond courtrooms, reshaping how Microsoft builds Copilot, how OpenAI trains its next model, and ultimately how we all access the news on our screens.