Kadrey v. Meta
Last reviewed
May 19, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,076 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 19, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,076 words
Add missing citations, update stale details, or suggest a clearer explanation.
Kadrey v. Meta Platforms, Inc. is a putative class action lawsuit filed in 2023 by a group of book authors against Meta Platforms, Inc., alleging that Meta infringed their copyrights by training its LLaMA family of large language models on books downloaded from shadow libraries such as LibGen, Anna's Archive, Z-Library, and Bibliotik. The case is captioned Kadrey et al. v. Meta Platforms, Inc., Case No. 3:23-cv-03417-VC, and is pending in the U.S. District Court for the Northern District of California before Judge Vince Chhabria.[^1][^2]
The lawsuit became one of the most closely watched copyright cases of the generative AI era. On June 25, 2025, Judge Chhabria granted Meta partial summary judgment, holding that Meta's use of the named plaintiffs' books to train Llama qualified as fair use under the specific record presented. The ruling, however, was unusually qualified: the judge wrote that his decision did "not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," but only that "these plaintiffs made the wrong arguments and failed to develop a record in support of the right one."[^3][^4]
The case continues in 2026 on separate distribution-rights claims related to Meta's torrenting of pirated book files. It is regularly discussed alongside Bartz v. Anthropic, Concord Music Group v. Anthropic, and New York Times v. OpenAI as part of the first wave of generative-AI copyright litigation in the United States.[^5][^6]
The action was initiated by three named plaintiffs:
After the action was consolidated with a related case (Chabon v. Meta Platforms) and amended several times, the plaintiff group grew to thirteen authors. In addition to the three original plaintiffs, the consolidated complaint named Ta-Nehisi Coates, Junot Diaz, Andrew Sean Greer, David Henry Hwang, Matthew Klam, Laura Lippman, Rachel Louise Snyder, Lysa TerKeurst, Jacqueline Woodson, and Christopher Farnsworth.[^8][^9] Their works span genres and include Pulitzer Prize winning novels (Diaz's The Brief Wondrous Life of Oscar Wao and Greer's Less), a Broadway play (Hwang), and nonfiction (Snyder's No Visible Bruises).[^8]
The plaintiffs were initially represented by the Joseph Saveri Law Firm, the same firm that filed numerous early AI copyright cases. In September 2024, after Judge Chhabria criticized the conduct of the litigation, the prominent firm Boies Schiller Flexner joined as co-counsel, with David Boies among the new attorneys of record.[^10]
The sole defendant is Meta Platforms, Inc., the parent company of Facebook, Instagram, and WhatsApp, and the developer of the open-weight Llama 2, Llama 3, and Llama 4 families of large language models. Meta has been represented by Cooley LLP.[^11]
The case is being heard in the U.S. District Court for the Northern District of California, sitting in San Francisco. Judge Vince Chhabria, a 2014 Obama appointee, presides. Chhabria has handled several other high-profile cases (including litigation over Roundup weed killer) and is widely regarded as an unusually pointed and skeptical jurist.[^1][^12]
The original class action complaint was filed on July 7, 2023, in the Northern District of California by Kadrey, Silverman, and Golden on behalf of themselves and a putative class of similarly situated authors.[^7][^13] It alleged six causes of action:
The complaint's core factual allegation was that Meta had copied many of the plaintiffs' copyrighted books, without permission or compensation, and used them as training data for Llama. The plaintiffs identified the Books3 dataset, a corpus of nearly 200,000 books compiled by AI researcher Shawn Presser and distributed as part of EleutherAI's Pile dataset, as a likely source. Books3 itself is widely reported to have been derived from the shadow library Bibliotik, a torrent-based site hosting copyrighted e-books.[^14][^15]
A parallel lawsuit, Silverman v. OpenAI, was filed against OpenAI on the same day. The two suits raised closely related theories but proceeded on separate tracks.[^16]
Meta moved to dismiss the complaint. On November 20, 2023, Judge Chhabria granted the motion in substantial part. The court allowed the plaintiffs' core claim of direct copyright infringement (i.e., the unauthorized copying of their books into Meta's training corpus) to proceed but dismissed most ancillary theories.[^17][^18]
Chhabria characterized one of the plaintiffs' more aggressive theories, that the Llama models themselves are infringing derivative works of every book they were trained on, as "nonsensical." The court reasoned that an LLM consists of weights and parameters expressing abstract mathematical relationships and is not a recasting or adaptation of any particular author's expression. The court also rejected the theory that every model output is an infringing derivative without specific allegations of substantially similar outputs.[^17][^18]
Most ancillary claims were dismissed with leave to amend; the negligence claim was dismissed with prejudice. The plaintiffs subsequently filed amended and consolidated complaints, ultimately culminating in a Third Amended Consolidated Complaint filed January 21, 2025, after the case had been consolidated with the related Chabon action.[^8][^19]
Discovery in Kadrey produced an unusually rich body of internal Meta documents bearing on the company's knowledge that material in its training data had been obtained from pirate sources. The revelations attracted extensive press coverage and are widely regarded as among the most consequential factual disclosures in any pending AI copyright case.
Documents unsealed during the proceedings indicated that Meta downloaded books in bulk from multiple shadow libraries, principally LibGen (Library Genesis), Anna's Archive, and Z-Library. Plaintiffs alleged, based on internal Meta records, that Meta torrented approximately 81.7 terabytes of data from these libraries to use as training material for Llama.[^20][^21]
In February 2025, Ashley Belanger of Ars Technica reported on unsealed exhibits showing the scale of Meta's downloads and quoting internal messages in which Meta engineers acknowledged the legal sensitivity of the activity.[^20] The Register and Rolling Stone also covered the disclosures.[^22][^23]
Court filings cited a series of internal Meta communications expressing concern over the legality and ethics of using pirated material. According to the filings, one engineer wrote that "torrenting from a corporate laptop doesn't feel right." Another internal exchange flagged that "using pirated material should be beyond our ethical threshold." Meta employees also discussed avoiding use of corporate IP addresses while accessing the shadow libraries.[^21][^24]
Plaintiffs further alleged, again citing internal documents, that a Meta engineer (identified in pleadings as Nikolay Bashlykov) acknowledged stripping copyright management information from LibGen-sourced files before they were used in training. This evidence underlay the plaintiffs' DMCA § 1202(b) claim.[^21][^25]
Perhaps the most widely quoted internal document was an email cited in court filings indicating that Meta's decision to proceed with LibGen as a training source was "escalated to MZ," meaning Meta CEO Mark Zuckerberg. According to the filings, an executive (Sony Theakanath, director of product management) wrote that after escalation to MZ, the generative-AI team had "been approved to use LibGen" for Llama, with the further caveat that Meta would not publicly disclose its use of the dataset.[^21][^24]
Mark Zuckerberg was deposed in the case. According to filings and news reports describing the testimony, Zuckerberg stated when asked about the activity that the type of conduct described in the complaint would raise "lots of red flags" and "seems like a bad thing." Asked specifically about LibGen, Zuckerberg testified that he had "not really heard of" the dataset.[^23][^24] In a related discussion of training data sources, he referenced Meta's own platforms and YouTube as alternative content reservoirs.[^26]
The discovery record also reflected a January 2025 dispute in which the plaintiffs alleged that Meta produced unusually incriminating internal documents only hours before the close of fact discovery, prompting motion practice over additional disclosures.[^21][^24]
Meta's principal defense was that its copying of copyrighted books for the purpose of training Llama constituted fair use under 17 U.S.C. § 107. Meta argued that:
Meta also argued that its alleged downloading from shadow libraries should not change the fair-use analysis because the relevant "use" was the training itself, not the antecedent acquisition. The plaintiffs vigorously contested this framing, arguing that knowing acquisition from pirate sources should weigh against fair use under the "bad faith" prong of the first fair-use factor.[^4]
On June 25, 2025, Judge Chhabria issued a 40-page order denying the plaintiffs' motion for partial summary judgment and granting Meta's cross-motion for partial summary judgment on the reproduction-based copyright claim.[^3][^4] The ruling addressed the cross-motions on the central training-use issue.
Applying the four-factor fair-use test, Chhabria held:
Chhabria's opinion was striking for its candor about the limits of the ruling. In a passage that became one of the most quoted statements in the early AI copyright case law, the court wrote:
"This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one."[^3][^28]
The court explained that the plaintiffs had not adequately pursued or supported a market dilution theory. Under that theory, an AI model that produces large quantities of content similar in style or genre to a given author's work could indirectly substitute for and dilute the market for the original works, even without copying any particular passage verbatim. Chhabria suggested that plaintiffs in future cases could very well prevail under such a theory if they presented adequate evidence.[^3][^28]
The opinion also contained pointed observations about Meta. Chhabria noted that AI companies generating "billions, even trillions of dollars" should generally be able to compensate copyright holders if their training requires the holders' works. He cautioned that in many circumstances "it will be illegal to copy copyright-protected works to train generative AI without permission" and that companies "will generally need to pay copyright holders."[^3][^4]
In a separate order issued shortly after the principal opinion, the court granted summary judgment for Meta on the plaintiffs' DMCA § 1202(b) claim concerning the alleged removal of copyright management information. The court reasoned that because the underlying training copying was fair use and therefore not infringement, there was no underlying infringement that the alleged removal of CMI could have facilitated or concealed, leaving the statutory predicate for the DMCA claim unmet.[^25][^29]
The court emphasized that it was deciding only the reproduction-based copyright claim arising from Meta's training use. Neither side had moved for summary judgment on the plaintiffs' separate theory that Meta had violated the authors' distribution right by uploading book files (leeching or seeding) during the torrenting process. That distribution claim remained a live issue in the case.[^4][^30]
The Authors Guild characterized the ruling as "a technical win" for Meta on procedural and evidentiary grounds rather than a substantive endorsement of AI training on copyrighted works. CEO Mary Rasenberger emphasized in interviews and a Guild statement that the opinion in fact endorsed the legal premise that unauthorized AI training can be infringing in many circumstances, and that the judge had expressed dim views of Meta's piracy.[^28]
The Copyright Alliance, an industry group representing rights holders, was similarly critical of how the ruling had been characterized in popular media, arguing that Chhabria's opinion contained important caveats that had been underemphasized.[^31]
Meta hailed the ruling as a vindication of fair use as applied to AI training. Law firms representing technology companies, including Goodwin Procter, Skadden, and Perkins Coie, published client alerts portraying the ruling as a significant data point favoring AI developers, while noting Chhabria's market-dilution caveats and warning clients that future cases on better records could come out differently.[^4][^27][^29]
In the months following the ruling, the parties continued to litigate the remaining distribution-rights claim. On July 9, 2025, the parties filed a Joint Case Management Statement addressing how to proceed on the distribution issue and on a number of unresolved scheduling and amendment questions.[^30] Plaintiffs moved to amend their complaint to add contributory infringement and additional uploading-based claims; Judge Chhabria characterized the delay in seeking amendment as "inexcusable" but invited supplemental briefing.[^32]
By early 2026, the court had issued a new scheduling order setting an extended timeline for the distribution-claim phase. Under that schedule, expert reports are due in autumn 2026, summary judgment briefing is set for late 2026 through early 2027, and the hearing on summary judgment on the distribution claim is scheduled for February 25, 2027 in San Francisco.[^33] In a March 25, 2026 order, the court granted plaintiffs leave to amend their complaint on the contributory and distribution-related theories.[^33]
As of May 2026, the June 25, 2025 ruling on the training/reproduction claim has not been finally appealed to the U.S. Court of Appeals for the Ninth Circuit, because the district court has not entered a final judgment. With the distribution claim still pending in the trial court, the reproduction ruling remains an interlocutory disposition that is not yet ripe for an appeal as of right. Plaintiffs and commentators have publicly indicated an intention to appeal the fair-use ruling once the case becomes final, but no Ninth Circuit briefing schedule has been set in connection with the Kadrey training ruling as of May 2026.[^33]
Two days before the Kadrey ruling, on June 23, 2025, Judge William Alsup (also of the Northern District of California) issued a summary judgment ruling in Bartz v. Anthropic. Alsup found that Anthropic's training of its Claude models on legally acquired books was "exceedingly transformative" fair use, but held separately that Anthropic's earlier downloading and retention of a "central library" of millions of pirated books was not fair use because it was a non-transformative use untethered to the actual training.[^5][^34]
Chhabria and Alsup thus reached opposite conclusions on the legality of acquiring training materials from pirate sources. Alsup treated the act of pirate acquisition as a discrete and non-transformative use. Chhabria, by contrast, viewed the acquisition as part of an integrated transformative chain leading to model training. Chhabria also expressly criticized Alsup's emphasis on transformativeness, calling for closer attention to market harm as the controlling factor in AI fair-use analysis.[^5][^34]
Bartz was subsequently settled by Anthropic in 2025, with reports describing a class-wide settlement following class certification on the piracy claims; the procedural posture of Bartz therefore differs materially from Kadrey.[^35]
Concord Music Group v. Anthropic, filed by major music publishers, raised fair-use and copyright issues in the context of song lyrics rather than books. Like Kadrey, it raised questions about whether AI training on copyrighted text constitutes fair use, but on a different factual record and against a different defendant.[^36]
In New York Times v. OpenAI, filed in late 2023 in the Southern District of New York, The New York Times alleged that OpenAI and Microsoft trained ChatGPT on Times articles and that the model could be induced to reproduce substantial portions of those articles. That case differs from Kadrey in two important respects: it concerns periodical journalism rather than books, and it includes detailed evidence of alleged verbatim outputs (regurgitation) directly tied to specific copyrighted works, an evidentiary record that Chhabria specifically noted the Kadrey plaintiffs had failed to develop.[^37]
Taken together, the Kadrey, Bartz, Concord, and New York Times cases represent the first generation of judicial reasoning about the application of U.S. fair-use doctrine to generative AI training. Industry commentators and academic observers have widely characterized Kadrey as the most cautious of the early "AI-favorable" rulings: while Meta won on the existing record, the opinion expressly invited plaintiffs in future cases to mount stronger market-dilution and pirate-acquisition challenges.[^5][^6][^27]
As of May 19, 2026:
The case continues to be closely followed because of its precedential weight on the fair-use question, its unusually detailed record concerning Meta's use of pirate sources, and Judge Chhabria's explicit invitation to future plaintiffs to develop market-dilution theories.