AI and Copyright
Last reviewed
Sources
28 citations
Review status
Source-backed
Revision
v2 ยท 3,140 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
28 citations
Review status
Source-backed
Revision
v2 ยท 3,140 words
Add missing citations, update stale details, or suggest a clearer explanation.
AI and copyright is the body of litigation, regulation, and licensing practice that decides how copyright law applies to artificial intelligence, organized around two core questions: whether copying protected works to train a model is lawful, and whether the works a model produces can infringe existing copyrights or earn copyright protection of their own. The field began with a 2023 wave of lawsuits against OpenAI, Stability AI, and Meta and has since produced the largest copyright settlement in United States history (Anthropic's $1.5 billion deal with authors), a set of partially conflicting United States fair use rulings, the first European merits judgments on model training, and new statutory obligations under the EU AI Act. As of June 2026, no United States appellate court has ruled on whether training generative AI on copyrighted works is fair use, and litigation trackers count well over one hundred pending AI copyright cases worldwide [1].
The disputes divide cleanly into an input side and an output side. On the input side, AI developers copied books, news articles, images, code, and music at scale to train large language models and diffusion models, frequently from scraped web data and pirate "shadow libraries" such as LibGen. Rights holders argue this is mass infringement; developers argue the copying is transformative fair use in the United States and is covered by text and data mining (TDM) exceptions in Europe and elsewhere. On the output side, disputes concern models that reproduce memorized text or images, generate famous characters, or emit content that competes with the works they were trained on, plus the separate question of whether a purely machine-generated work can be registered at all.
The first substantive United States rulings arrived in 2025, and they split along factual lines: training on lawfully obtained material has fared well under fair use, while reliance on pirated source libraries and demonstrable output reproduction have created liability exposure. In parallel, a licensing economy has grown up around the litigation, ranging from bilateral publisher deals to marketplace infrastructure operated by Cloudflare and Microsoft [2][3].
| Case | Court | Filed | Status as of June 2026 |
|---|---|---|---|
| Bartz v. Anthropic | N.D. Cal. | Aug 2024 | $1.5 billion class settlement; final approval pending after May 14, 2026 fairness hearing [4][5] |
| NYT v. OpenAI (now in re OpenAI MDL) | S.D.N.Y. | Dec 27, 2023 | Discovery; OpenAI ordered to produce 20 million chat logs (affirmed Jan 5, 2026) [6] |
| Kadrey v. Meta | N.D. Cal. | Jul 2023 | Meta won summary judgment on training (Jun 25, 2025); amended piracy and distribution claims continue [7] |
| Andersen v. Stability AI | N.D. Cal. | Jan 13, 2023 | Core infringement claims survived; trial set for Sep 8, 2026 [28] |
| Getty Images v. Stability AI (UK) | High Court of England and Wales | Jan 2023 | Nov 4, 2025 judgment rejected secondary copyright claims; narrow trademark win for Getty [8] |
| Getty Images v. Stability AI (US) | N.D. Cal. (refiled from D. Del.) | Feb 2023 | Motion to dismiss largely denied Apr 23, 2026; in discovery [9] |
| Disney and Universal v. Midjourney | C.D. Cal. | Jun 11, 2025 | In discovery; Warner Bros. Discovery filed a parallel suit Sep 4, 2025 [10] |
| Concord Music Group v. Anthropic | N.D. Cal. | Oct 2023 | Expanded complaint Jan 2026; Anthropic seeking summary judgment [11] |
| UMG, Sony, Warner v. Suno and Udio | D. Mass. / S.D.N.Y. | Jun 2024 | UMG and Warner settled with licensing deals (Oct to Nov 2025); Sony's claims continue [12] |
| Thomson Reuters v. Ross Intelligence | D. Del. | May 2020 | Feb 11, 2025 ruling rejected fair use; interlocutory appeal before the Third Circuit [13] |
The highest-stakes consolidated proceeding is in re OpenAI, Inc. Copyright Infringement Litigation (MDL No. 3143), a multidistrict docket before Judge Sidney Stein in the Southern District of New York that gathers more than a dozen suits, including The New York Times case against OpenAI and Microsoft filed on December 27, 2023, and the Authors Guild class action. Stein largely denied OpenAI's motion to dismiss in March 2025, letting the core infringement claims proceed, and on October 27, 2025 he let the consolidated plaintiffs' output-based infringement theory move past the pleading stage [14]. Discovery has been contentious: in November 2025 a magistrate judge ordered OpenAI to hand over a sample of 20 million de-identified ChatGPT conversations, and Stein affirmed the order in full on January 5, 2026, rejecting OpenAI's privacy objections. No trial date had been set as of June 2026 [6].
In Bartz v. Anthropic, authors sued Anthropic over the books it copied while developing Claude, including more than seven million titles downloaded from the pirate libraries LibGen and PiLiMi to build an internal "central library." After Judge William Alsup's split summary judgment ruling (see below) exposed Anthropic to a December 2025 trial on the pirated-library claims, the company agreed in September 2025 to pay at least $1.5 billion, about $3,000 per work for roughly 500,000 books, and to destroy the pirated copies, without admitting liability. Alsup granted preliminary approval on September 25, 2025. Following his retirement the case passed to Judge Araceli Martinez-Olguin, who held the final fairness hearing on May 14, 2026; claims had been filed for 92.77 percent of eligible works, and final approval remained pending as of June 10, 2026 [4][5].
The earliest image case, Andersen v. Stability AI, was filed on January 13, 2023, when cartoonist Sarah Andersen and fellow artists Kelly McKernan and Karla Ortiz sued Stability AI, Midjourney, and DeviantArt over Stable Diffusion, which was trained on the LAION dataset of about 5 billion scraped image-text pairs. Judge William Orrick dismissed most claims initially but allowed the direct infringement theory based on training to proceed, and in his August 2024 order let an expanded complaint move into discovery, with trial scheduled for September 8, 2026 [28].
Getty Images pursued Stability AI on both sides of the Atlantic over Stable Diffusion. In the United Kingdom, Getty dropped its primary training and output infringement claims mid-trial in June 2025 after evidence showed the training occurred outside the UK, and on November 4, 2025 the High Court rejected its remaining secondary infringement theory, holding that model weights do not store copies of training images and so are not "infringing copies," while finding limited trademark infringement where outputs reproduced Getty watermarks [8]. Getty refiled its American case in the Northern District of California in August 2025, and in April 2026 the court allowed nearly all claims to proceed [9].
Hollywood entered the fight on June 11, 2025, when Disney and Universal sued Midjourney in the Central District of California over outputs depicting characters such as Darth Vader and the Minions, seeking statutory damages of up to $150,000 per work; Warner Bros. Discovery filed a similar suit that September [10]. In music, the major labels sued Suno and Udio in June 2024 over training on sound recordings. Universal Music Group settled with Udio in late October 2025, and Warner Music Group settled with both services in November 2025, converting the disputes into licensing partnerships for authorized AI music platforms; Sony Music has not settled, and in early June 2026 the American Federation of Musicians sued UMG and Warner over the deals, claiming members were never compensated [12][15].
Three 2025 district court decisions frame the American fair use debate [13]:
The core analytical battle is over the first and fourth of the four fair use factors in 17 U.S.C. 107: the purpose and character of the use (and whether training is "transformative") and the effect on the market for the original work. Alsup analogized model training to human learning, writing that Anthropic's models trained on the works "not to race ahead and replicate or supplant them" but to create something different, while Chhabria's market-dilution framing treated the fourth factor as potentially decisive, a view the United States Copyright Office had also endorsed weeks earlier [13][17]. After the ruling, the Kadrey plaintiffs moved to press amended claims over Meta's alleged uploading of pirated books through BitTorrent, which remain in litigation [7].
United States law requires human authorship, so a work produced autonomously by a machine cannot be registered. In Thaler v. Perlmutter, Stephen Thaler sought registration for an image titled "A Recent Entrance to Paradise" that he said his "Creativity Machine" created autonomously; the Copyright Office refused, the D.C. Circuit affirmed the refusal on March 18, 2025 (holding that the Copyright Act "requires all eligible work to be authored in the first instance by a human being"), and the Supreme Court denied certiorari on March 2, 2026, leaving the human-authorship rule intact [18].
The more common case is the AI-assisted work, where a human uses a tool such as Midjourney. The benchmark decision is Zarya of the Dawn (February 21, 2023), in which the Copyright Office registered the human-authored text and the selection and arrangement of a graphic novel by artist Kris Kashtanova but cancelled protection for the individual Midjourney-generated images, finding that Kashtanova lacked sufficient control over the output to be their author [17]. Building on Zarya, Part 2 of the office's Copyright and Artificial Intelligence report (January 2025) concluded that outputs are registrable to the extent a human contributed perceptible expressive content, but that text prompts alone generally do not confer authorship [17].
The Copyright Office issued its Copyright and Artificial Intelligence study in three parts. Part 1 (July 2024) covered digital replicas and deepfakes. Part 2 (January 2025) addressed the copyrightability of AI outputs. The Part 3 pre-publication report on generative AI training (May 9, 2025) concluded that fair use must be assessed case by case, writing that "some uses of copyrighted works for generative AI training will qualify as fair use, and some will not," that some training uses are transformative while others are not, and that market dilution is a cognizable harm; the office said a final version would follow without substantive changes [17][27].
The office itself then became contested ground. Register of Copyrights Shira Perlmutter was fired by the Trump administration on May 10, 2025, one day after Part 3's release, a sequence that drew congressional scrutiny. She sued, and in September 2025 a divided D.C. Circuit panel temporarily reinstated her while her appeal proceeds; in November 2025 the Supreme Court deferred acting on the government's stay request pending related removal-power cases. Perlmutter remained in office and testified before the Senate Judiciary IP subcommittee on May 12, 2026 [19][20].
In the European Union, the text and data mining exceptions of the 2019 DSM Copyright Directive allow training on lawfully accessed works unless rights holders opt out. The EU AI Act layered obligations on top: from August 2, 2025, providers of general-purpose AI models must maintain a copyright policy that honors TDM opt-outs and must publish a "sufficiently detailed summary" of their training content using a mandatory European Commission template that was finalized on July 24, 2025. A voluntary General-Purpose AI Code of Practice with a copyright chapter was published on July 10, 2025; OpenAI, Google, Anthropic, and Microsoft signed, while Meta declined [3].
Germany delivered Europe's first major merits ruling on November 11, 2025, when the Munich Regional Court (case 42 O 14139/24) held in GEMA v. OpenAI that GPT-4 and GPT-4o unlawfully reproduced German song lyrics. The court reasoned that memorization in model weights is a reproduction not covered by the TDM exception, which it said applies only to the initial analytical phase of training, and it ordered injunctive relief, disclosure, and damages over the lyrics of nine well-known German songs. OpenAI said it would appeal [21].
In the United Kingdom, beyond the Getty judgment, a government consultation that proposed a broad TDM exception with an opt-out drew intense opposition from creative industries, and the Data (Use and Access) Act became law in June 2025 without the AI transparency amendments championed in the House of Lords; further government reports were promised rather than immediate legislation [8]. Chinese courts have diverged on outputs: the Beijing Internet Court held in November 2023 that an AI-generated image reflecting sufficient human input could be copyrighted, and a Guangzhou court in February 2024 found a generative service liable for outputs depicting the character Ultraman.
Alongside the lawsuits, a substantial licensing economy emerged. The Associated Press signed the first major news deal with OpenAI in July 2023; News Corp's May 2024 OpenAI agreement was reported at more than $250 million over five years; and Reddit licensed its corpus to Google for a reported $60 million per year, with a comparable OpenAI deal [2][22]. In May 2025, The New York Times, while still suing OpenAI, signed its first AI licensing deal, with Amazon, reported at $20 million to $25 million per year [23].
Infrastructure followed. In July 2025 Cloudflare began blocking known AI crawlers by default for new sites and launched "pay per crawl," which lets publishers charge crawlers per request; in January 2026 it acquired Human Native, a London-based AI data licensing marketplace [24][25]. The Really Simple Licensing (RSL) standard, introduced in September 2025, embeds machine-readable licensing terms in websites [2]. Microsoft launched its Publisher Content Marketplace on February 4, 2026, brokering usage-based licensing between publishers including Conde Nast, Hearst, the AP, USA Today, and Vox Media and AI buyers, starting with its own Copilot [26]. The music settlements with Suno and Udio similarly converted litigation into licensed product launches [12].
The pivotal unresolved question, whether training generative models on copyrighted works without permission is fair use, is now headed to appellate courts through Thomson Reuters v. Ross and, eventually, the OpenAI multidistrict litigation, while Sony's cases against Suno and Udio and the Andersen image case are expected to produce further trial-level rulings in 2026 [12][13][28]. The Anthropic settlement, if finally approved, will distribute roughly $1.5 billion to authors and publishers and has already shaped settlement expectations across the industry [4][5]. Congress has not enacted AI copyright legislation, leaving policy to the courts, the Copyright Office, whose leadership remains in litigation, and private ordering through the rapidly growing licensing market [19][2].