Bartz v. Anthropic
Last reviewed
May 19, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,305 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 19, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,305 words
Add missing citations, update stale details, or suggest a clearer explanation.
Bartz et al. v. Anthropic PBC, case number 3:24-cv-05417, is a landmark copyright class action filed in the U.S. District Court for the Northern District of California against [[anthropic|Anthropic]], the developer of the [[claude|Claude]] family of large language models.[^1] Initiated on August 19, 2024, by three professional authors, the case became the first major copyright lawsuit involving generative artificial intelligence to produce both a substantive fair use ruling and a class-wide monetary settlement.[^1][^2] In a bifurcated summary judgment order issued on June 23, 2025, U.S. District Judge William Alsup held that training a large language model on lawfully acquired books constitutes transformative fair use, but that Anthropic's downloading and retention of pirated copies obtained from "shadow libraries" was not protected by fair use.[^3][^4] After the court certified a class of rightsholders on July 17, 2025, the parties announced a proposed settlement of at least $1.5 billion on September 5, 2025, the largest publicly reported copyright recovery in U.S. history.[^5][^6] The settlement received preliminary approval on September 25, 2025, and after Judge Alsup's retirement at the end of 2025, the case was reassigned to U.S. District Judge Araceli Martinez-Olguin, who held the fairness hearing on May 14, 2026.[^7][^8][^9]
Bartz v. Anthropic occupies a pivotal place in the early jurisprudence of generative artificial intelligence and copyright law because it produced the first substantive judicial ruling on whether training a [[large_language_model|large language model]] on copyrighted books qualifies as fair use under Section 107 of the U.S. Copyright Act.[^4][^10] The case is also significant because it bifurcated the fair use inquiry: the court drew a sharp legal line between the act of using copyrighted works as machine-learning inputs, which it found transformative, and the upstream conduct of acquiring those works from unauthorized sources, which it found infringing.[^3][^11] The eventual settlement of at least $1.5 billion plus interest, covering roughly 500,000 pirated books at approximately $3,000 per work, established a benchmark figure for damages negotiations in pending copyright suits against other AI developers and demonstrated that legitimate provenance of [[training_data|training data]] is a material legal risk for the AI industry.[^5][^12]
The case was assigned to Judge William Alsup, a senior district judge in the Northern District of California with a long record of presiding over high-profile technology disputes, including Oracle v. Google.[^13] Judge Alsup's June 23, 2025 summary judgment order and his July 17, 2025 class certification order shaped the litigation in ways that left Anthropic with extraordinary potential statutory damages exposure, calculated by some commentators at hundreds of billions of dollars, and prompted the company to settle on the eve of a December 2025 trial.[^14][^11] After Judge Alsup retired at the end of 2025, the case was reassigned to Judge Araceli Martinez-Olguin for final approval proceedings.[^8]
The three named plaintiffs are working authors whose books were among the copyrighted works that Anthropic allegedly downloaded from pirated repositories to train Claude.[^15][^2]
Andrea Bartz is a New York-based novelist and journalist, the author of thrillers including The Lost Night, The Herd, We Were Never Here, and The Spare Room.[^15] Her novel The Lost Night was identified by plaintiffs as one of the works included in the Books3 dataset that Anthropic downloaded.[^16]
Charles Graeber is a New York-based nonfiction author and journalist whose books include The Good Nurse: A True Story of Medicine, Madness, and Murder and The Breakthrough: Immunotherapy and the Race to Cure Cancer.[^15] Graeber has contributed essays to The New Yorker, The New York Times, and GQ.[^15]
Kirk Wallace Johnson is a Los Angeles-based nonfiction author whose works include The Feather Thief: Beauty, Obsession, and the Natural History Heist of the Century, The Fisherman and the Dragon, and To Be a Friend Is Fatal: The Fight to Save the Iraqis America Left Behind.[^15]
The plaintiffs filed their initial complaint in the U.S. District Court for the Northern District of California on August 19, 2024.[^1][^17] An amended complaint was filed on December 4, 2024 with the defendant's consent.[^18] The plaintiffs were represented by Susman Godfrey LLP (with Justin Nelson serving as co-lead counsel) and Lieff Cabraser Heimann & Bernstein LLP (with Rachel Geman serving as co-lead counsel).[^6] The sole defendant was Anthropic PBC, the public-benefit corporation that develops and operates the Claude family of large language models.[^17]
The amended complaint alleged that Anthropic built and improved Claude using copyrighted books obtained without authorization from pirate datasets, including Books3, Library Genesis (commonly known as "LibGen"), and the Pirate Library Mirror (commonly known as "PiLiMi").[^15][^16] Specifically, plaintiffs alleged that Anthropic:[^16]
The complaint asserted causes of action for direct copyright infringement under the Copyright Act and sought statutory damages, injunctive relief, and other remedies.[^18] Because statutory damages for willful infringement can reach $150,000 per work under 17 U.S.C. Section 504(c)(2), commentators noted that aggregate exposure could theoretically extend into the tens or even hundreds of billions of dollars once a class of rightsholders was certified.[^14][^11]
The complaint also discussed Anthropic's later effort to build what the company internally described as a "central library" of texts, including a project to acquire and digitize lawfully purchased print books in addition to the pirated downloads.[^3][^4] The court's later fair use ruling drew an important distinction between those two acquisition channels.
Anthropic's principal defense was that its copying of copyrighted books, both to train Claude and to build an internal research library, constituted fair use under 17 U.S.C. Section 107.[^4][^10] Anthropic argued that the use of books as inputs to train a large language model is "spectacularly transformative" because the resulting model does not store or reproduce the source works in human-readable form and is not a market substitute for the underlying books.[^4][^19] Anthropic also argued that converting lawfully purchased print copies into digital form for internal use was a fair-use space-saving practice analogous to format-shifting.[^10][^19]
Anthropic did not dispute that it had downloaded books from LibGen and PiLiMi, but argued that these downloads were preparatory to a transformative training use and that the fair use doctrine should apply at the level of the ultimate purpose rather than the source from which copies were acquired.[^4][^10] Anthropic emphasized that the company maintained that its publicly released Claude products did not output verbatim copies of the plaintiffs' works.[^20]
After the September 2025 settlement was announced, Anthropic Deputy General Counsel Aparna Sridhar stated that the agreement "simply resolves narrow claims about how certain materials were obtained" and noted that "the court's landmark June ruling that AI training constitutes transformative fair use remains intact."[^20] Sridhar added that the company remained "committed to developing safe AI systems that help people and organizations extend their capabilities, advance scientific discovery and solve complex problems."[^20] Anthropic did not admit liability as part of the settlement.[^5][^21]
On June 23, 2025, Judge Alsup issued a summary judgment order that became the first substantive judicial decision on whether training a generative AI model on copyrighted books is fair use.[^3][^4] The order analyzed three distinct categories of Anthropic's conduct and reached different conclusions for each, producing a bifurcated outcome.[^3][^22]
Judge Alsup held that using copyrighted books, regardless of acquisition source, to train Anthropic's large language models was fair use as a matter of law.[^4][^22] The court characterized this use as "exceedingly transformative" and described the act of training as analogous to the transformative learning processes by which human authors read and absorb prior works in order to write new ones.[^4][^10] Applying the four statutory fair use factors, the court found that:[^4][^11]
Judge Alsup concluded that the training of Claude on copyrighted books was "quintessentially transformative" and was protected by fair use.[^4][^22]
Judge Alsup separately held that Anthropic's conversion of lawfully purchased print books into digital form for inclusion in its internal research library was also fair use.[^4][^10] The court reasoned that the new digital copies were not redistributed and instead operated as space-saving format conversions analogous to other recognized fair uses, with the print originals being discarded after digitization.[^10][^19]
Judge Alsup denied summary judgment for Anthropic on the claims arising from downloading and retaining pirated copies obtained from LibGen and PiLiMi.[^3][^11] The court reasoned that even where an ultimate use of copies (training a model) is transformative, the predicate act of pirating copies to build a permanent library cannot itself be defended as fair use when the works were available through lawful channels.[^3][^4] The court characterized this conduct as a separate, non-transformative use, observing that Anthropic had built and retained a permanent library of pirated material that went beyond what training required.[^11][^22] As a result, the issue of liability for the pirated downloads was set to proceed to trial.[^3][^11]
The order was widely described in the legal and technology press as the first major substantive ruling on fair use for AI training, and it was closely studied alongside the contemporaneous decision in Kadrey v. Meta in the same district.[^22][^10] In Kadrey, U.S. District Judge Vince Chhabria likewise concluded that training a [[generative_ai|generative AI]] model on copyrighted books could be fair use, though on different reasoning, producing parallel but not identical doctrinal frameworks for AI training in the Northern District of California.[^22] Commentators noted that the order created a strong incentive for AI developers to use lawfully licensed or purchased training data and to avoid pirate sources, even when the downstream use might be defensible as transformative.[^11][^19] The Authors Guild characterized the decision as a "mixed" result, welcoming the finding of infringement for pirated copies while expressing concern about the fair use holding on training.[^23]
Several legal commentators highlighted three doctrinal features of the order with broad practical relevance.[^10][^19][^31] First, the court endorsed the analogy between machine learning and human reading as a foundation for finding the training process transformative, a framework other courts may either adopt or distinguish. Second, the court rejected the argument that the entire chain of conduct, from acquisition through training, should be evaluated as a single integrated transformative use, instead holding that intermediate steps must independently satisfy fair use. Third, the court's treatment of factor four (market harm) accepted that an emerging market for AI training licenses is not, by itself, a market that copyright owners can claim a right to control, although the court left room for future developments in this area.[^4][^10]
On July 17, 2025, Judge Alsup granted plaintiffs' motion for class certification with respect to the pirated-download claims.[^24][^25] The certified class encompassed beneficial or legal copyright owners of the exclusive reproduction right for books that had been downloaded by Anthropic from LibGen or PiLiMi, were associated with an ISBN or ASIN, and had been registered with the U.S. Copyright Office within five years of publication and either before the August 10, 2022 download date or within three months of publication.[^26][^25]
The court excluded the Books3 dataset from class certification because the metadata associated with the Books3 corpus was, in the court's view, not reliable enough to permit fair identification of class members.[^27] As a practical result, authors whose works appeared in Books3 but not also in LibGen or PiLiMi were not included in the certified class.[^27]
Coverage of the certification order emphasized that the certified class could include hundreds of thousands of rightsholders, with some estimates of the underlying LibGen and PiLiMi corpora reaching approximately seven million unique book titles.[^24][^25] Because statutory damages for willful copyright infringement can reach $150,000 per work, several legal commentators observed that the class certification dramatically increased Anthropic's potential exposure and significantly raised the pressure to settle.[^14][^11]
On August 26, 2025, the parties filed a notice of a proposed class-wide settlement, indicating that they had executed a binding term sheet and were finalizing a definitive settlement agreement.[^18][^28] The full proposed settlement was filed in early September 2025, and on September 5, 2025, Anthropic publicly announced that it had agreed to pay at least $1.5 billion plus interest to resolve the class action.[^21][^5]
Under the proposed settlement, Anthropic agreed to pay a Settlement Fund of at least $1.5 billion plus accrued interest, working out to approximately $3,000 per copyrighted work in the certified class of roughly 500,000 books from LibGen and PiLiMi.[^5][^6] Settlement counsel, Anthropic, and Judge Alsup all characterized the agreement as the largest publicly reported copyright recovery in history.[^6][^11] Justin Nelson, co-lead counsel for the class, described the agreement as "the first of its kind in the AI era," stating that it "far surpasses any other known copyright recovery."[^6]
In addition to monetary relief, Anthropic agreed to destroy the original LibGen and PiLiMi files in its possession and any derivative copies, and to confirm the destruction in writing.[^12][^29] The settlement releases claims against Anthropic for past acts through August 25, 2025 but does not grant the company any prospective license to use class members' works.[^6] Anthropic did not admit liability and continued to characterize the agreement as resolving "narrow claims about how certain materials were obtained."[^20]
The settlement class is consistent with the certification order: it covers beneficial or legal copyright owners of the exclusive reproduction right for books in LibGen or PiLiMi that bear an ISBN or ASIN and meet certain registration timing requirements with the U.S. Copyright Office.[^26] The class was understood at preliminary approval to cover approximately 465,000 to 500,000 books, with later filings reporting a final Works List of approximately 465,000 titles.[^21][^29] Trade and university press titles followed a default 50/50 split between authors and publishers, while textbooks and works with a sole rightsholder were subject to different allocation rules.[^26]
Judge Alsup initially delayed preliminary approval at a September 8, 2025 hearing, expressing skepticism about the proposed claims process and the workability of distributions to a class of unknown rightsholders.[^29] He instructed counsel to revise the proposal and convened a further hearing on September 25, 2025.[^29] On that date, Judge Alsup granted preliminary approval of the revised settlement, describing it as "fair, reasonable, and adequate" and remarking, "We have some of the best lawyers in America in this courtroom and if anyone can do it, you can."[^21][^7]
The Authors Guild stated that the settlement "sends a clear signal to AI companies that infringement of authors' rights comes at a steep price."[^21] The Association of American Publishers and its president, Maria Pallante, endorsed the settlement and observed that "Anthropic is hardly a special case when it comes to infringement."[^21] Some authors and rights organizations, however, raised concerns: a group of 28 authors and rightsholders, including Angie Cruz and Dave Eggers, opted out of the settlement and filed their own individual action against Anthropic.[^30]
Several authors' advocacy organizations published guides interpreting the settlement for their members. The Authors Guild, the Association of American Publishers, the Authors Alliance, and the Copyright Alliance each released detailed FAQ and explainer materials describing eligibility, claims procedures, and allocation rules.[^26][^36][^37][^38] University presses, including Princeton University Press and Yale University Press, published guidance on how their authors and the press's own claims would be handled under the settlement's default allocation rules.[^39][^40] These resources collectively reflected widespread interest in the settlement among professional and academic authors and the perceived complexity of its claims process.
A key driver of the settlement's size was the statutory damages framework of the U.S. Copyright Act. Under 17 U.S.C. Section 504(c), a copyright owner may elect, instead of actual damages, an award of statutory damages of between $750 and $30,000 per work infringed, with the upper limit rising to $150,000 per work for willful infringement.[^14][^11] Because the certified class included roughly 465,000 to 500,000 works, even the lower end of the statutory range would have produced multi-billion-dollar exposure, and the $150,000 willful infringement ceiling implied theoretical exposure on the order of tens of billions to potentially over $1 trillion in worst-case scenarios for Anthropic.[^14][^11] Anthropic itself was reported to have characterized the prospect of facing trial as creating "inordinate pressure" to settle in light of these damages dynamics.[^14] The $1.5 billion settlement therefore represents an outcome that, while substantial in absolute terms, is far below the theoretical statutory ceiling.[^11][^31]
The combined effect of the June 23, 2025 fair use ruling and the September 2025 settlement has been to crystallize several practical lessons for the broader AI industry.
The bifurcated fair use ruling created a strong incentive for AI developers, including [[anthropic|Anthropic]], [[openai|OpenAI]], [[meta|Meta]], and other large language model labs, to prefer licensed or lawfully acquired training data over corpora derived from pirate libraries.[^11][^19][^12] Commentators noted that even a defensible downstream use does not necessarily insulate an AI developer from liability for infringing acquisition.[^19][^31] Several major AI developers have announced licensing arrangements with publishers and rightsholders in the period after the Bartz ruling.[^12]
The $3,000 per-work figure has been widely interpreted as a starting point in damages discussions for similar pending suits against other AI developers.[^11][^12] Several copyright actions against other AI developers, including ongoing litigation involving OpenAI and Meta, have used the Bartz settlement as a comparator in negotiation and pleading.[^12][^32]
Because the Bartz settlement releases only past claims and does not vacate or alter the June 23, 2025 ruling, the holding that AI training on lawfully acquired works is transformative fair use remains intact within the Northern District of California.[^20][^11] The settlement therefore preserved a doctrinal victory for AI developers on the threshold question of training-stage fair use while imposing substantial monetary consequences for the use of pirated sources.[^11][^19] The contemporaneous Kadrey v. Meta ruling reached a generally similar conclusion on training-stage fair use in the same district.[^22]
Legal analyses of the case have emphasized that AI developers should maintain detailed records of [[training_data|training data]] provenance, perform due diligence on dataset sources, and remove or destroy unlawfully acquired data when required.[^31][^19] The Bartz settlement's non-monetary requirement that Anthropic destroy its LibGen and PiLiMi files provides a concrete template for similar relief in other cases.[^12][^29]
Following Judge Alsup's retirement at the end of 2025, the Bartz case was reassigned to U.S. District Judge Araceli Martinez-Olguin, who had previously presided over a similar copyright action against OpenAI.[^8][^33] On April 8, 2026, Judge Martinez-Olguin issued an order moving the final fairness hearing from April 23, 2026 to May 14, 2026.[^9]
The fairness hearing took place on Thursday, May 14, 2026, before Judge Martinez-Olguin in the San Francisco Federal Courthouse.[^9][^34] At the hearing, class counsel reported a claims rate of approximately 92.77%, with approximately 447,576 of the eligible works claimed by rightsholders, up from a previous figure of 91.3% at the end of April 2026.[^34] Approximately 350 valid opt-outs covering roughly 1,802 works were filed.[^34]
Objections raised at the hearing included concerns about group copyright registrations, treatment of pseudonymous and self-published authors, the adequacy of the $3,000 per-work payment, and the exclusion of foreign works without U.S. copyright registrations.[^34] Judge Martinez-Olguin did not rule from the bench, instead taking the matter under submission and indicating that orders could issue in the near term.[^34][^35] As of May 19, 2026, the court had not yet issued a final approval order, and disbursement of settlement funds was scheduled to begin after final approval, with distribution calculations targeted for June 11, 2026 if the settlement is approved as proposed.[^34][^35]
The Bartz case has been widely described by legal commentators as the most consequential single development in the early history of copyright and generative AI, both for its substantive fair use analysis and for its establishment of a multi-billion-dollar damages benchmark.[^11][^12][^31]