AI and Copyright

AI Policy

16 min read

Updated Jun 24, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 24, 2026

Fact-checked

In review queue

Sources

28 citations

Revision

v2 · 3,140 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

AI and copyright is the body of litigation, regulation, and licensing practice that decides how copyright law applies to artificial intelligence, organized around two core questions: whether copying protected works to train a model is lawful, and whether the works a model produces can infringe existing copyrights or earn copyright protection of their own. The field began with a 2023 wave of lawsuits against OpenAI, Stability AI, and Meta and has since produced the largest copyright settlement in United States history (Anthropic's $1.5 billion deal with authors), a set of partially conflicting United States fair use rulings, the first European merits judgments on model training, and new statutory obligations under the EU AI Act. As of June 2026, no United States appellate court has ruled on whether training generative AI on copyrighted works is fair use, and litigation trackers count well over one hundred pending AI copyright cases worldwide ^[1].

What are the two core copyright questions in AI?

The disputes divide cleanly into an input side and an output side. On the input side, AI developers copied books, news articles, images, code, and music at scale to train large language models and diffusion models, frequently from scraped web data and pirate "shadow libraries" such as LibGen. Rights holders argue this is mass infringement; developers argue the copying is transformative fair use in the United States and is covered by text and data mining (TDM) exceptions in Europe and elsewhere. On the output side, disputes concern models that reproduce memorized text or images, generate famous characters, or emit content that competes with the works they were trained on, plus the separate question of whether a purely machine-generated work can be registered at all.

The first substantive United States rulings arrived in 2025, and they split along factual lines: training on lawfully obtained material has fared well under fair use, while reliance on pirated source libraries and demonstrable output reproduction have created liability exposure. In parallel, a licensing economy has grown up around the litigation, ranging from bilateral publisher deals to marketplace infrastructure operated by Cloudflare and Microsoft ^[2]^[3].

Which lawsuits target AI training data?

Case	Court	Filed	Status as of June 2026
Bartz v. Anthropic	N.D. Cal.	Aug 2024	$1.5 billion class settlement; final approval pending after May 14, 2026 fairness hearing ^[4]^[5]
NYT v. OpenAI (now in re OpenAI MDL)	S.D.N.Y.	Dec 27, 2023	Discovery; OpenAI ordered to produce 20 million chat logs (affirmed Jan 5, 2026) ^[6]
Kadrey v. Meta	N.D. Cal.	Jul 2023	Meta won summary judgment on training (Jun 25, 2025); amended piracy and distribution claims continue ^[7]
Andersen v. Stability AI	N.D. Cal.	Jan 13, 2023	Core infringement claims survived; trial set for Sep 8, 2026 ^[28]
Getty Images v. Stability AI (UK)	High Court of England and Wales	Jan 2023	Nov 4, 2025 judgment rejected secondary copyright claims; narrow trademark win for Getty ^[8]
Getty Images v. Stability AI (US)	N.D. Cal. (refiled from D. Del.)	Feb 2023	Motion to dismiss largely denied Apr 23, 2026; in discovery ^[9]
Disney and Universal v. Midjourney	C.D. Cal.	Jun 11, 2025	In discovery; Warner Bros. Discovery filed a parallel suit Sep 4, 2025 ^[10]
Concord Music Group v. Anthropic	N.D. Cal.	Oct 2023	Expanded complaint Jan 2026; Anthropic seeking summary judgment ^[11]
UMG, Sony, Warner v. Suno and Udio	D. Mass. / S.D.N.Y.	Jun 2024	UMG and Warner settled with licensing deals (Oct to Nov 2025); Sony's claims continue ^[12]
Thomson Reuters v. Ross Intelligence	D. Del.	May 2020	Feb 11, 2025 ruling rejected fair use; interlocutory appeal before the Third Circuit ^[13]

The highest-stakes consolidated proceeding is in re OpenAI, Inc. Copyright Infringement Litigation (MDL No. 3143), a multidistrict docket before Judge Sidney Stein in the Southern District of New York that gathers more than a dozen suits, including The New York Times case against OpenAI and Microsoft filed on December 27, 2023, and the Authors Guild class action. Stein largely denied OpenAI's motion to dismiss in March 2025, letting the core infringement claims proceed, and on October 27, 2025 he let the consolidated plaintiffs' output-based infringement theory move past the pleading stage ^[14]. Discovery has been contentious: in November 2025 a magistrate judge ordered OpenAI to hand over a sample of 20 million de-identified ChatGPT conversations, and Stein affirmed the order in full on January 5, 2026, rejecting OpenAI's privacy objections. No trial date had been set as of June 2026 ^[6].

In Bartz v. Anthropic, authors sued Anthropic over the books it copied while developing Claude, including more than seven million titles downloaded from the pirate libraries LibGen and PiLiMi to build an internal "central library." After Judge William Alsup's split summary judgment ruling (see below) exposed Anthropic to a December 2025 trial on the pirated-library claims, the company agreed in September 2025 to pay at least $1.5 billion, about $3,000 per work for roughly 500,000 books, and to destroy the pirated copies, without admitting liability. Alsup granted preliminary approval on September 25, 2025. Following his retirement the case passed to Judge Araceli Martinez-Olguin, who held the final fairness hearing on May 14, 2026; claims had been filed for 92.77 percent of eligible works, and final approval remained pending as of June 10, 2026 ^[4]^[5].

The earliest image case, Andersen v. Stability AI, was filed on January 13, 2023, when cartoonist Sarah Andersen and fellow artists Kelly McKernan and Karla Ortiz sued Stability AI, Midjourney, and DeviantArt over Stable Diffusion, which was trained on the LAION dataset of about 5 billion scraped image-text pairs. Judge William Orrick dismissed most claims initially but allowed the direct infringement theory based on training to proceed, and in his August 2024 order let an expanded complaint move into discovery, with trial scheduled for September 8, 2026 ^[28].

Getty Images pursued Stability AI on both sides of the Atlantic over Stable Diffusion. In the United Kingdom, Getty dropped its primary training and output infringement claims mid-trial in June 2025 after evidence showed the training occurred outside the UK, and on November 4, 2025 the High Court rejected its remaining secondary infringement theory, holding that model weights do not store copies of training images and so are not "infringing copies," while finding limited trademark infringement where outputs reproduced Getty watermarks ^[8]. Getty refiled its American case in the Northern District of California in August 2025, and in April 2026 the court allowed nearly all claims to proceed ^[9].

Hollywood entered the fight on June 11, 2025, when Disney and Universal sued Midjourney in the Central District of California over outputs depicting characters such as Darth Vader and the Minions, seeking statutory damages of up to $150,000 per work; Warner Bros. Discovery filed a similar suit that September ^[10]. In music, the major labels sued Suno and Udio in June 2024 over training on sound recordings. Universal Music Group settled with Udio in late October 2025, and Warner Music Group settled with both services in November 2025, converting the disputes into licensing partnerships for authorized AI music platforms; Sony Music has not settled, and in early June 2026 the American Federation of Musicians sued UMG and Warner over the deals, claiming members were never compensated ^[12]^[15].

How have courts ruled on AI training and fair use?

Three 2025 district court decisions frame the American fair use debate ^[13]:

Thomson Reuters v. Ross Intelligence (February 11, 2025). Judge Stephanos Bibas held that a legal research startup's use of Westlaw headnotes to train a non-generative search tool was not fair use, stressing that the product competed directly with Westlaw. An interlocutory appeal is before the Third Circuit, the first appellate test of AI training.
Bartz v. Anthropic (June 23, 2025). Judge Alsup ruled that training LLMs on books was a transformative fair use, calling it "exceedingly transformative" and "spectacularly so," and held that digitizing lawfully purchased print copies was also fair, but that downloading millions of pirated books to build a permanent internal library was not fair use and had to go to trial ^[16].
Kadrey v. Meta (June 25, 2025). Judge Vince Chhabria granted Meta summary judgment on the use of books to train Llama, but only because the thirteen plaintiff authors failed to develop evidence of market harm. He wrote that the ruling "does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," and that it "stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one," suggesting that "market dilution," AI-generated works flooding the market for the originals, could be decisive in future cases ^[7].

The core analytical battle is over the first and fourth of the four fair use factors in 17 U.S.C. 107: the purpose and character of the use (and whether training is "transformative") and the effect on the market for the original work. Alsup analogized model training to human learning, writing that Anthropic's models trained on the works "not to race ahead and replicate or supplant them" but to create something different, while Chhabria's market-dilution framing treated the fourth factor as potentially decisive, a view the United States Copyright Office had also endorsed weeks earlier ^[13]^[17]. After the ruling, the Kadrey plaintiffs moved to press amended claims over Meta's alleged uploading of pirated books through BitTorrent, which remain in litigation ^[7].

Can AI-generated works be copyrighted?

United States law requires human authorship, so a work produced autonomously by a machine cannot be registered. In Thaler v. Perlmutter, Stephen Thaler sought registration for an image titled "A Recent Entrance to Paradise" that he said his "Creativity Machine" created autonomously; the Copyright Office refused, the D.C. Circuit affirmed the refusal on March 18, 2025 (holding that the Copyright Act "requires all eligible work to be authored in the first instance by a human being"), and the Supreme Court denied certiorari on March 2, 2026, leaving the human-authorship rule intact ^[18].

The more common case is the AI-assisted work, where a human uses a tool such as Midjourney. The benchmark decision is Zarya of the Dawn (February 21, 2023), in which the Copyright Office registered the human-authored text and the selection and arrangement of a graphic novel by artist Kris Kashtanova but cancelled protection for the individual Midjourney-generated images, finding that Kashtanova lacked sufficient control over the output to be their author ^[17]. Building on Zarya, Part 2 of the office's Copyright and Artificial Intelligence report (January 2025) concluded that outputs are registrable to the extent a human contributed perceptible expressive content, but that text prompts alone generally do not confer authorship ^[17].

What do the US Copyright Office reports say?

The Copyright Office issued its Copyright and Artificial Intelligence study in three parts. Part 1 (July 2024) covered digital replicas and deepfakes. Part 2 (January 2025) addressed the copyrightability of AI outputs. The Part 3 pre-publication report on generative AI training (May 9, 2025) concluded that fair use must be assessed case by case, writing that "some uses of copyrighted works for generative AI training will qualify as fair use, and some will not," that some training uses are transformative while others are not, and that market dilution is a cognizable harm; the office said a final version would follow without substantive changes ^[17]^[27].

The office itself then became contested ground. Register of Copyrights Shira Perlmutter was fired by the Trump administration on May 10, 2025, one day after Part 3's release, a sequence that drew congressional scrutiny. She sued, and in September 2025 a divided D.C. Circuit panel temporarily reinstated her while her appeal proceeds; in November 2025 the Supreme Court deferred acting on the government's stay request pending related removal-power cases. Perlmutter remained in office and testified before the Senate Judiciary IP subcommittee on May 12, 2026 ^[19]^[20].

How do the EU and UK approach AI copyright?

In the European Union, the text and data mining exceptions of the 2019 DSM Copyright Directive allow training on lawfully accessed works unless rights holders opt out. The EU AI Act layered obligations on top: from August 2, 2025, providers of general-purpose AI models must maintain a copyright policy that honors TDM opt-outs and must publish a "sufficiently detailed summary" of their training content using a mandatory European Commission template that was finalized on July 24, 2025. A voluntary General-Purpose AI Code of Practice with a copyright chapter was published on July 10, 2025; OpenAI, Google, Anthropic, and Microsoft signed, while Meta declined ^[3].

Germany delivered Europe's first major merits ruling on November 11, 2025, when the Munich Regional Court (case 42 O 14139/24) held in GEMA v. OpenAI that GPT-4 and GPT-4o unlawfully reproduced German song lyrics. The court reasoned that memorization in model weights is a reproduction not covered by the TDM exception, which it said applies only to the initial analytical phase of training, and it ordered injunctive relief, disclosure, and damages over the lyrics of nine well-known German songs. OpenAI said it would appeal ^[21].

In the United Kingdom, beyond the Getty judgment, a government consultation that proposed a broad TDM exception with an opt-out drew intense opposition from creative industries, and the Data (Use and Access) Act became law in June 2025 without the AI transparency amendments championed in the House of Lords; further government reports were promised rather than immediate legislation ^[8]. Chinese courts have diverged on outputs: the Beijing Internet Court held in November 2023 that an AI-generated image reflecting sufficient human input could be copyrighted, and a Guangzhou court in February 2024 found a generative service liable for outputs depicting the character Ultraman.

What does the AI training licensing market look like?

Alongside the lawsuits, a substantial licensing economy emerged. The Associated Press signed the first major news deal with OpenAI in July 2023; News Corp's May 2024 OpenAI agreement was reported at more than $250 million over five years; and Reddit licensed its corpus to Google for a reported $60 million per year, with a comparable OpenAI deal ^[2]^[22]. In May 2025, The New York Times, while still suing OpenAI, signed its first AI licensing deal, with Amazon, reported at $20 million to $25 million per year ^[23].

Infrastructure followed. In July 2025 Cloudflare began blocking known AI crawlers by default for new sites and launched "pay per crawl," which lets publishers charge crawlers per request; in January 2026 it acquired Human Native, a London-based AI data licensing marketplace ^[24]^[25]. The Really Simple Licensing (RSL) standard, introduced in September 2025, embeds machine-readable licensing terms in websites ^[2]. Microsoft launched its Publisher Content Marketplace on February 4, 2026, brokering usage-based licensing between publishers including Conde Nast, Hearst, the AP, USA Today, and Vox Media and AI buyers, starting with its own Copilot ^[26]. The music settlements with Suno and Udio similarly converted litigation into licensed product launches ^[12].

What is the outlook for AI copyright law?

The pivotal unresolved question, whether training generative models on copyrighted works without permission is fair use, is now headed to appellate courts through Thomson Reuters v. Ross and, eventually, the OpenAI multidistrict litigation, while Sony's cases against Suno and Udio and the Andersen image case are expected to produce further trial-level rulings in 2026 ^[12]^[13]^[28]. The Anthropic settlement, if finally approved, will distribute roughly $1.5 billion to authors and publishers and has already shaped settlement expectations across the industry ^[4]^[5]. Congress has not enacted AI copyright legislation, leaving policy to the courts, the Copyright Office, whose leadership remains in litigation, and private ordering through the rapidly growing licensing market ^[19]^[2].

References

AI Lawsuit Tracker: OpenAI, NYT and other AI copyright litigation cases - ailawsuittracker.com, accessed June 2026. ↩
Reddit Is Winning the AI Game - Columbia Journalism Review. ↩
Copyright compliance under the EU AI Act for GPAI model providers - Clifford Chance, October 2025. ↩
Bartz v. Anthropic Settlement: What Authors Need to Know - The Authors Guild. ↩
Anthropic Settlement Appears to Cruise Through Its Final Fairness Hearing - Publishing Perspectives, May 2026. ↩
OpenAI Loses Privacy Gambit: 20 Million ChatGPT Logs Likely Headed to Copyright Plaintiffs - National Law Review, January 2026. ↩
Meta Wins on Fair Use for Now, but Court Leaves Door Open for "Market Dilution" - Authors Alliance, June 26, 2025. ↩
Getty loses UK copyright battle against Stability AI - The Register, November 4, 2025. ↩
Getty Images (US), Inc. v. Stability AI, Ltd. - Loeb and Loeb, April 2026. ↩
Disney and Universal sue Midjourney for AI copyright infringement - CNBC, June 11, 2025. ↩
UMG, Concord Seriously Supersize Their Anthropic Lawsuit - Digital Music News, January 28, 2026. ↩
AI-Music Heavyweight Suno Partners With Warner Music Group After Lawsuit Settlement - Rolling Stone, November 2025. ↩
Copyright and AI Collide: Three Key Decisions on AI Training and Copyrighted Content from 2025 - IPWatchdog, December 23, 2025. ↩
Judge allows "New York Times" copyright case against OpenAI to go forward - NPR, March 26, 2025. ↩
US musicians union sues UMG and Warner Music, alleging member recordings were licensed to Suno and Udio without compensation or credit - Music Business Worldwide, June 2026. ↩
Anthropic Wins on Fair Use for Training its LLMs; Loses on Building a "Central Library" of Pirated Books - Authors Alliance, June 24, 2025. ↩
Copyright and Artificial Intelligence - U.S. Copyright Office. ↩
Supreme Court Denies Cert in AI Authorship Case - Mayer Brown, March 2026. ↩
Supreme Court defers decision on whether Trump can fire head of U.S. Copyright Office - SCOTUSblog, November 2025. ↩
Copyright Office Chief Defends AI Training Stance to Senate - Legis1, May 2026. ↩
Landmark ruling of the Munich Regional Court (GEMA v OpenAI) on copyright and AI training-on-copyright-and-ai-training) - Bird and Bird, November 2025. ↩
News Corp Inks OpenAI Licensing Deal Potentially Worth More Than $250 Million - Variety, May 2024. ↩
The New York Times and Amazon ink AI licensing deal - TechCrunch, May 29, 2025. ↩
Introducing pay per crawl: Enabling content owners to charge AI crawlers for access - Cloudflare, July 2025. ↩
Cloudflare acquires AI data marketplace Human Native - CNBC, January 15, 2026. ↩
Microsoft launches Publisher Content Marketplace for AI licensing - Search Engine Land, February 2026. ↩
U.S. Copyright Office Releases Part 3 of AI Report: What Authors Should Know - The Authors Guild, May 2025. ↩
Andersen v. Stability AI - BakerHostetler Litigation Tracker, accessed June 2026. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Britannica and Merriam-Webster v. OpenAI Concord Music Group v. Anthropic Data Provenance Initiative Disney & Universal v. Midjourney Getty Images v. Stability AI Legal Moonvalley New York Times v. OpenAI OpenAI ChatGPT logs discovery dispute Perplexity AI copyright lawsuits Studio Ghibli ChatGPT moment Universal Music v. Anthropic

What are the two core copyright questions in AI?

Which lawsuits target AI training data?

How have courts ruled on AI training and fair use?

Can AI-generated works be copyrighted?

What do the US Copyright Office reports say?

How do the EU and UK approach AI copyright?

What does the AI training licensing market look like?

What is the outlook for AI copyright law?

References

Improve this article

Related Articles

NIST

Open-Weight LLM License Comparison

What links here

Related Articles

NIST

Open-Weight LLM License Comparison

What links here