New York Times v. OpenAI
Last reviewed
May 19, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,030 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 19, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,030 words
Add missing citations, update stale details, or suggest a clearer explanation.
The New York Times Company v. Microsoft Corporation, OpenAI Inc., et al. (case number 1:23-cv-11195) is a federal copyright lawsuit filed by The New York Times Company against Microsoft and OpenAI in the U.S. District Court for the Southern District of New York on December 27, 2023. The complaint alleges that OpenAI and Microsoft copied "millions" of Times articles to train large language models that power ChatGPT, Microsoft Copilot, and related products, and that those models reproduce Times content verbatim or near-verbatim and "hallucinate" false statements falsely attributed to the Times.[^1][^2] The Times seeks billions of dollars in statutory and actual damages, an injunction, and the destruction of any GPT or other language model and training set that incorporates Times works.[^1]
The case is widely considered the most consequential test of how United States copyright law applies to the training and outputs of generative-AI systems. It is being heard by U.S. District Judge Sidney H. Stein, who in April 2025 denied most of OpenAI's and Microsoft's motions to dismiss and allowed direct copyright infringement, contributory infringement, certain Digital Millennium Copyright Act (DMCA) claims, and trademark dilution claims to proceed.[^3] U.S. Magistrate Judge Ona T. Wang has presided over discovery, including a high-profile May 13, 2025 order requiring OpenAI to preserve all ChatGPT and API output logs that would otherwise have been deleted, an order Judge Stein later upheld.[^4][^5] The Times' case has been consolidated for pretrial purposes with parallel suits by a group of regional newspapers led by the New York Daily News and by the Center for Investigative Reporting (CIR).[^3]
By late 2023, the use of copyrighted text to train generative AI systems had become one of the most contested legal questions in the technology industry. OpenAI's GPT-3 (2020), GPT-3.5 (2022) and GPT-4 (2023) had been trained on enormous text corpora that included scraped web content from datasets such as Common Crawl, WebText, and Books2. OpenAI did not disclose the precise composition of its training data after GPT-2, but court filings later confirmed that the Times' content was present in the training corpora and that material from Common Crawl in particular contained a substantial volume of Times articles.[^6]
The November 2022 launch of ChatGPT, which reached an estimated 100 million users within three months of release, intensified scrutiny of how those models had been built.[^6] Authors, news organizations, image platforms and individual artists began filing copyright suits against AI developers throughout 2023, including Sarah Silverman's class action against OpenAI and Meta, the Authors Guild action, and Getty Images' suit against Stability AI. The Times, which had separately objected to being scraped by web crawlers, blocked OpenAI's GPTBot from its websites in August 2023, shortly after OpenAI publicly documented the crawler.[^7]
The Times had also engaged OpenAI directly. According to OpenAI's January 8, 2024 blog post "OpenAI and journalism," the two companies began discussions in April 2023 about a "high-value partnership" involving real-time display of Times content in ChatGPT with attribution, and "those negotiations appeared to be progressing constructively through our last communication on December 19."[^8] The Times' complaint, filed eight days later, characterized the talks differently, saying OpenAI and Microsoft had refused to engage in a meaningful licensing dialogue.[^1]
The Times filed its 69-page complaint on December 27, 2023, captioned The New York Times Company v. Microsoft Corporation, OpenAI Inc., OpenAI LP, OpenAI GP LLC, OpenAI LLC, OpenAI OpCo LLC, OpenAI Global LLC, OAI Corporation LLC, and OpenAI Holdings LLC, case number 1:23-cv-11195 in the U.S. District Court for the Southern District of New York.[^1] Lead trial counsel for the Times is Ian B. Crosby of Susman Godfrey, joined by lawyers from Rothwell, Figg, Ernst & Manbeck.[^9]
The complaint asserted six causes of action: (1) direct copyright infringement under 17 U.S.C. § 501; (2) vicarious copyright infringement; (3) contributory copyright infringement; (4) violations of the DMCA, 17 U.S.C. § 1202, for the removal or alteration of copyright management information (CMI); (5) common-law unfair competition by misappropriation ("hot news"); and (6) federal trademark dilution under 15 U.S.C. § 1125(c).[^3][^1] An August 2024 First Amended Complaint added additional Times works as exhibits but did not assert any new legal theories.[^3]
The complaint frames the dispute in three principal stages.
First, the Times alleges that OpenAI and Microsoft made wholesale unauthorized copies of millions of Times articles when assembling training corpora and when training their GPT models. The complaint identifies Common Crawl, WebText, and WebText2 as datasets that, according to the Times, contained a "staggering" volume of scraped Times material, and emphasizes that the Times' domains were among the most heavily weighted sources in some of these corpora.[^6][^1]
Second, the Times alleges that the GPT models themselves are "infringing derivative works" because their parameters encode copies of protected Times expression, and that those parameters allow the models to produce Times text on demand.[^1]
Third, the Times alleges that ChatGPT and related products generate outputs that copy Times articles verbatim or near-verbatim. Exhibit J to the complaint presents one hundred examples in which a short prefix from a published Times article, supplied as a prompt to GPT-4, was followed by an output reproducing hundreds of additional words from the same article with only minor variation.[^10] The complaint pairs these "memorization" examples with allegations that Bing search results powered by GPT-4 and the "Browse with Bing" plugin to ChatGPT generated "extensive paraphrases and direct quotes" of Times content without referring users back to the Times.[^6]
A fourth allegation, given prominent treatment in the complaint, concerns hallucinations that misattribute fabricated content to the Times, including invented health and consumer-product recommendations purportedly drawn from Wirecutter, the Times' product-review site. The Times alleges that these fabrications dilute its trademarks and harm its journalistic reputation, citing examples in which ChatGPT generated false Wirecutter-style "best of" lists for products and services that Wirecutter never reviewed, complete with fabricated commentary attributed to the Times' editorial voice.[^1] The Times argues that this conduct constitutes both false designation of origin under the Lanham Act and a separate species of trademark dilution by tarnishment, because consumers who rely on these fabricated recommendations may form negative associations with the Times' brand when the underlying claims turn out to be inaccurate.[^1]
The Times asks for an order requiring OpenAI and Microsoft to "destroy, pursuant to 17 U.S.C. § 503(b), all GPT or other LLM models and training sets that incorporate Times Works," together with billions of dollars in statutory and actual damages.[^1] Statutory damages under 17 U.S.C. § 504 can reach $150,000 per work for willful infringement, and the Times pleads that its registered, copyrighted works number in the millions.[^6]
On January 8, 2024, OpenAI published a blog post titled "OpenAI and journalism," signed by the OpenAI leadership team, that pushed back against the lawsuit on four fronts.[^8] OpenAI stated that (1) it collaborates with news organizations and is "creating new opportunities" through partnerships and product features; (2) training generative-AI models on publicly available internet materials is fair use; (3) "regurgitation" of training data is a "rare bug" the company is "working to drive to zero"; and (4) the Times "is not telling the full story." OpenAI asserted that "we had explained to The New York Times that, like any single source, their content didn't meaningfully contribute to the training of our existing models." Regarding the regurgitation examples, OpenAI suggested the Times "intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate."[^8]
Microsoft's formal answer and motion practice followed the same general themes, emphasizing fair use, the transformative nature of LLM training, and the limited frequency of verbatim outputs in ordinary use. In its January 2024 filings and public statements, Microsoft also stressed that Common Crawl, the principal source of scraped Times content alleged in the complaint, is a long-established public dataset used by researchers worldwide.[^7]
OpenAI moved to dismiss several counts in February 2024, arguing that direct-infringement claims based on the creation and use of the GPT-2 and GPT-3 training datasets in 2019 and 2020 were time-barred under the Copyright Act's three-year statute of limitations in 17 U.S.C. § 507(b), and that the Times' DMCA, contributory infringement, and "hot news" misappropriation claims failed as a matter of law.[^3] Microsoft moved separately on overlapping grounds and added a challenge to the federal trademark dilution claim, eventually broader in the Daily News action.[^3]
The Times' case was not the only newspaper suit against OpenAI and Microsoft in the Southern District of New York. On April 30, 2024, the Daily News LP and seven other regional newspapers, including the Chicago Tribune, the Orlando Sentinel, the Sun-Sentinel, the San Jose Mercury News, the Denver Post, the Orange County Register, and the Pioneer Press, filed a parallel complaint asserting essentially the same claims as the Times, with the addition of a New York General Business Law § 360-l trademark dilution claim. The case was docketed as 24-cv-3285.[^3] In June 2024, the Center for Investigative Reporting, publisher of Mother Jones and Reveal, filed its own action against OpenAI and Microsoft, docketed as 24-cv-4872, with an amended complaint following on September 24, 2024.[^3]
The three actions were consolidated for pretrial purposes before Judge Stein. With the consolidation came a separate group of cases brought by book authors, including the Authors Guild action, that were assigned to Judge Stein as well. Throughout 2024 and 2025, Judge Stein's court became the principal forum for U.S. copyright disputes against OpenAI, with parallel disputes pending in California against Meta, Anthropic, and other developers.[^11]
On March 26, 2025, Judge Stein issued a brief order indicating how he would rule on the defendants' motions to dismiss, telling the parties an opinion would follow "expeditiously."[^12] His full opinion was filed on April 4, 2025 and runs 42 pages. The opinion is a single ruling that decides motions in the Times, Daily News, and CIR actions together.[^6]
The Court denied OpenAI's and Microsoft's motions to dismiss the direct copyright infringement claims to the extent the defendants argued the claims were time-barred. Applying the Second Circuit's "discovery rule," Judge Stein held that the plaintiffs could pursue direct-infringement claims based on training conduct in 2019 and 2020 because they had plausibly alleged that they neither discovered nor should have discovered the alleged infringements before December 27, 2020 (for the Times) or April 30, 2021 (for the Daily News).[^6] The opinion observed that OpenAI had publicly disclosed very little about the contents of its training corpora during this period and that the existence of the alleged infringement was not knowable from outside the company.[^6]
The Court also denied the defendants' motions to dismiss the contributory copyright infringement claims. Judge Stein concluded that the plaintiffs had plausibly alleged (i) third-party infringement by ChatGPT users, (ii) defendants' knowledge of such infringement (drawing on widely publicized regurgitation incidents and the more than one hundred examples included in Exhibit J), and (iii) that defendants' tools were not protected by the "substantial noninfringing uses" doctrine at the pleading stage.[^6]
In the Daily News action, the Court denied the defendants' motions to dismiss both the federal trademark dilution claim under 15 U.S.C. § 1125(c) and the New York General Business Law § 360-l trademark dilution claim, finding that the plaintiffs had adequately pleaded that their marks were "famous" and rejecting Microsoft's argument that the New York statute violates the Dormant Commerce Clause.[^6]
The Court also denied OpenAI's motion to dismiss certain DMCA § 1202(b)(1) claims, namely those against OpenAI in the Daily News and CIR actions, allowing the plaintiffs to proceed on the theory that OpenAI intentionally removed copyright management information from copies of their works during training.[^6]
Judge Stein granted the defendants' motions to dismiss the common-law unfair competition by misappropriation ("hot news") claims in all three actions, holding that the plaintiffs failed to allege the elements of a hot-news misappropriation tort under International News Service v. Associated Press (1918). These claims were dismissed with prejudice.[^6]
The Court also granted OpenAI's motion to dismiss the "abridgment" claims in the CIR action, dismissed with prejudice on the ground that the abridgments alleged in the complaint were not substantially similar to CIR's copyrighted works as a matter of law.[^6]
Several DMCA claims were dismissed without prejudice: the § 1202(b)(1) claims against Microsoft in all three actions, the § 1202(b)(1) claim against OpenAI in the Times' action specifically, and the § 1202(b)(3) claims against both defendants in all three actions.[^6] The dismissal of the Times' § 1202(b)(1) claim against OpenAI was a notable setback for the Times in particular; Judge Stein concluded that the Times had not adequately alleged intentional removal of CMI under that subsection, although the Daily News plaintiffs and CIR had.[^6]
The April 2025 opinion did not decide any of the merits questions that the case ultimately turns on, in particular whether OpenAI's training is protected by fair use. But by allowing the direct infringement, contributory infringement, federal and state trademark dilution, and certain DMCA claims to go forward, the ruling assured that the case would proceed to discovery and likely to a jury trial. Commentators noted that the discovery-rule ruling was particularly important because it preserved claims tied to the training of GPT-3 in 2020 and GPT-3.5 in 2022, the models on which much of the Times' theory of liability depends.[^13]
The most contested feature of the case in 2025 was a series of discovery disputes over the preservation and production of ChatGPT and API output logs. The Times argued that it needed access to actual ChatGPT outputs in order to test the prevalence of memorization and verbatim reproduction of its articles, and pressed for orders requiring OpenAI to preserve logs that the company would otherwise have deleted under its retention policies.
On May 13, 2025, Magistrate Judge Ona T. Wang ordered OpenAI to "preserve and segregate all output log data that would otherwise be deleted on a going forward basis until further order of the Court."[^4] The order applied to consumer ChatGPT logs (including for Free, Plus, Pro, and Team users) and to API outputs that OpenAI was not contractually required to retain. It overrode OpenAI's default 30-day retention window for deleted chats, the immediate deletion of "Temporary Chats," and the zero-retention guarantees the company had offered to certain API customers.
OpenAI publicly criticized the order. In a June 2025 statement, the company wrote that the order forced it to act "against our longstanding privacy norms and weaken[ed] the privacy protections we've promised our users" and described the demand as an "unfounded request" the company would "fight."[^14] Magistrate Judge Wang denied OpenAI's motion for reconsideration on May 16, 2025, holding that OpenAI had not put forth any facts or law the Court had not already considered.[^4]
OpenAI appealed Magistrate Judge Wang's preservation order to Judge Stein. After oral argument, Judge Stein upheld the preservation order in late June 2025, prioritizing the litigation hold over OpenAI's user-privacy arguments.[^15] In July 2025, the Times moved to compel production of a sample of 120 million chat logs from December 2022 through November 2024 in order to study the frequency of Times-related outputs.
After months of negotiation between the parties, OpenAI had offered the Times a 20-million-log sample, which the plaintiffs accepted, but OpenAI then changed position in October 2025, proposing to run keyword searches over the preserved logs and produce only conversations that implicated specific plaintiffs' works. On November 7, 2025, Magistrate Judge Wang rejected that proposal and ordered OpenAI to produce the full 20-million-log de-identified sample.[^16]
OpenAI again appealed to Judge Stein. On January 8, 2026, Judge Stein affirmed Magistrate Judge Wang's order, holding that "Judge Wang's rulings were neither clearly erroneous nor contrary to law" and that user privacy interests were sufficiently protected by the de-identification process and the protective order already in place.[^16][^15] Judge Stein distinguished a securities-law precedent on which OpenAI had relied involving SEC wiretap recordings, noting that ChatGPT users had voluntarily submitted their communications and that OpenAI's ownership of the logs was uncontested. The opinion rejected OpenAI's argument that the law required the court to "order the least burdensome discovery possible."[^15]
The preservation aspect of the order ended on September 26, 2025, allowing OpenAI to resume normal data-retention practices for newly generated content; logs from the April-September 2025 window remained segregated for the litigation.[^16] The dispute spurred substantial public debate over the privacy implications of broad discovery in AI litigation, with civil-liberties groups and OpenAI users both filing third-party objections.
While the Times case has been the most visible, by 2026 it sat within a broader ecosystem of AI-training copyright disputes that produced several mixed signals on the central question of fair use. In June 2025, in two separate cases against Anthropic and Meta involving books, two California federal judges held that training large language models on copyrighted books was sufficiently transformative to constitute fair use, while reaching different conclusions on liability for the way training corpora were assembled.[^17] In Bartz v. Anthropic, Judge William Alsup ruled that Anthropic's downloading and retention of pirated books to assemble a permanent training library was not fair use, even though training itself was; the case later settled for approximately $1.5 billion in September 2025.[^17]
OpenAI has cited those fair-use rulings in its filings against the Times, arguing that the same reasoning should govern. The Times has responded that its case differs in several respects, most notably the extensive evidence of verbatim memorization of its articles and the use of GPT-powered tools (Bing, Microsoft Copilot, Browse with Bing) as competitive substitutes for the Times itself.[^14]
The Times' case is widely seen as the highest-stakes copyright dispute of the generative-AI era because of the size of the works at issue (over ten million registered Times articles by the time of the First Amended Complaint, according to the Times),[^6] the directly competitive relationship between OpenAI's products and the Times' subscription journalism, and the scope of relief the Times has demanded, including the destruction of model weights.
As of mid-May 2026, the consolidated cases remain in active discovery before Judge Stein and Magistrate Judge Wang. According to public schedules and litigation-tracking reports from late 2025, plaintiffs' expert reports were due in mid-November 2025, with summary-judgment briefing scheduled to conclude in early April 2026.[^18][^14] No trial date had been publicly set as of the most recent reporting, and no settlement has been reached. OpenAI maintained throughout the period its position that LLM training is protected fair use; the Times maintained that the case will go to a jury.[^14][^15]
In May 2026, attention in the AI industry was largely focused on the Northern District of California trial in Musk v. Altman, which began jury selection on April 27, 2026 and concluded closing arguments on May 14, 2026.[^19] That case, while addressing different legal questions (governance and contract claims rather than copyright), reinforced the view that 2026 would be a pivotal year for the courts' treatment of OpenAI and the artificial-intelligence sector as a whole.
The New York Times v. OpenAI case has already had practical consequences regardless of its ultimate outcome.
First, the case produced one of the first major decisions allowing newspaper copyright claims against an AI developer to proceed past the pleading stage. The April 2025 ruling on contributory infringement is widely cited by other plaintiffs to argue that the existence of "substantial noninfringing uses" cannot defeat infringement claims at the motion-to-dismiss stage.[^13][^11]
Second, the preservation and production orders concerning ChatGPT logs raised novel issues at the intersection of e-discovery, user privacy, and AI product design. Companies that operate consumer LLM products have begun reassessing their data-retention policies in light of the possibility of similar orders in other litigation. The Times' case is the first in which a court has ordered indefinite preservation of consumer LLM conversation logs and ultimately the production of a large sample.[^16]
Third, the case has spurred parallel licensing deals between AI developers and news publishers. Following the Times' filing, OpenAI signed content-licensing agreements with the Associated Press, Axel Springer (publisher of Politico and Business Insider), the Financial Times, News Corp (publisher of The Wall Street Journal), Vox Media, Dotdash Meredith, and others. Many AI legal observers have noted that these agreements likely reflect a "litigation discount" priced into the deals as developers seek to limit their exposure to copyright suits.[^11]
Fourth, the case has reshaped the public debate on whether the training of generative-AI systems on copyrighted works is permissible under existing fair-use doctrine, on whether trained model weights can themselves be infringing derivative works, and on the appropriate remedy if liability is found. The remedy question is particularly stark in the Times' case because the company has demanded the destruction of GPT models that incorporate Times works, a remedy that would be unprecedented in scope if granted.[^1]
Whatever its ultimate disposition, the Times' case is likely to remain a benchmark for U.S. copyright law's response to generative AI for years to come.