# Music

> Source: https://aiwiki.ai/wiki/music
> Updated: 2026-06-28
> Categories: AI Tools & Products, Generative AI, Speech & Audio AI
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

*See also: [Music ChatGPT Plugins](/wiki/music_chatgpt_plugins)*

**AI in music** is the use of [artificial intelligence](/wiki/artificial_intelligence), especially [machine learning](/wiki/machine_learning) and [generative AI](/wiki/generative_ai), to compose, perform, mix, master, transcribe, voice-clone, and reproduce music. The field stretches back to mid-twentieth-century algorithmic composition and accelerated sharply after 2016, when [Google Brain](/wiki/google_brain) launched the open-source [Magenta](/wiki/magenta) project [7]. By the mid-2020s, dedicated music generators such as [Suno](/wiki/suno) and [Udio](/wiki/udio) could produce full songs with synthesized vocals from a short text prompt, triggering coordinated copyright lawsuits, new state and federal legislation, and an industry-wide debate over consent, training data, and the future of recorded music [20].

The scale is now measurable. By April 2026, the streaming service Deezer reported that roughly 75,000 fully AI-generated tracks were being uploaded to its platform every day, about 44 percent of all new music it received, up from around 10,000 per day a year earlier [42][52]. AI-generated music still accounted for only 1 to 3 percent of total streams on Deezer, but the company said about 85 percent of those streams were flagged as fraudulent and demonetized [52]. The leading consumer tool, Suno, raised a $250 million round in November 2025 at a $2.45 billion valuation on roughly $200 million in annual revenue, then a $400 million round in 2026 at a $5.4 billion valuation, even while facing major-label litigation [47][48].

## What is AI in music?

AI in music covers two broad activities: generating new audio or symbolic scores, and processing or distributing existing recordings. Generative models produce instrumental beds, full songs with synthesized vocals, stem separations, and style transfers. On the operations side, AI handles mastering (LANDR, iZotope), transcription ([Whisper](/wiki/whisper)), playlist personalization, voice synthesis, and copyright detection. Streaming platforms such as Spotify, YouTube Music, and Deezer have built consumer-facing AI features that range from voice-cloned DJs to AI-track detectors that police royalty fraud [28][52].

The industry's relationship to AI has been openly contradictory. The same major labels that filed landmark infringement suits against Suno and Udio in June 2024 had, by late 2025, begun settling those cases and signing licensing partnerships with the same companies they had accused of mass infringement [20][22][23]. Working musicians, songwriters, and union members have been more uniformly skeptical, citing concerns about scraped training data, voice deepfakes, and royalty dilution from machine-generated tracks flooding streaming platforms.

## History

### Pre-deep-learning algorithmic composition

Long before neural networks, composers experimented with rule-based and probabilistic systems. The Greek-French composer [Iannis Xenakis](/wiki/iannis_xenakis) applied probability theory to composition starting in the mid-1950s; his treatise *Musiques formelles* (1963) described stochastic procedures that he later implemented in his ST computer program, used to compose works including ST/4 and ST/10 [41]. American composer [David Cope](/wiki/david_cope) began writing his Experiments in Musical Intelligence (EMI, pronounced "Emmy") in 1981 while procrastinating an opera commission. EMI analyzed a composer's catalog and produced new pieces in that style. In a now-famous Turing-style test organized by Douglas Hofstadter, audience members at the University of Oregon identified an EMI Bach pastiche as the real Bach piece and a human composer's piece as the machine output [39].

In 2002, Douglas Eck and [Jürgen Schmidhuber](/wiki/jurgen_schmidhuber) at IDSIA published *Finding Temporal Structure in Music: Blues Improvisation with LSTM Recurrent Networks*, demonstrating that [long short-term memory](/wiki/lstm) networks could learn 12-bar blues structure and improvise novel melodies that respected the chord progression [1]. The paper is widely cited as the moment when deep learning entered serious music research.

In Spain, the Iamus computer cluster at the Universidad de Málaga produced its Opus one on 15 October 2010, described as the first fragment of professional contemporary classical music composed by a machine in its own style. Iamus's first full work, *Hello World!*, premiered exactly one year later. In 2012, the London Symphony Orchestra recorded an album of Iamus pieces, which *New Scientist* called the first complete album composed solely by a computer and recorded by human musicians [40].

### The Magenta era (2016 to 2019)

Google Brain announced [Magenta](/wiki/magenta) on 1 June 2016 with a question: can machines make music and art? [7] The project produced a stream of open-source models and datasets through the late 2010s, including the WaveNet-based [NSynth](/wiki/nsynth) (April 2017), MusicVAE for melody interpolation, Music Transformer, and a Magenta Studio plugin suite for Ableton Live. The NSynth paper, *Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders*, was a collaboration between Brain and [DeepMind](/wiki/deepmind) and shipped with a public dataset of more than 300,000 instrument notes [2].

[OpenAI](/wiki/openai) entered the field with MuseNet, announced on 25 April 2019. MuseNet was a 72-layer Sparse Transformer that could generate four-minute compositions across ten instruments and many styles, trained on MIDI from sources including ClassicalArchives and BitMidi [6]. A year later, on 30 April 2020, OpenAI released [Jukebox](/wiki/jukebox), an autoregressive model that worked directly on raw audio. Jukebox used a hierarchy of VQ-VAEs to compress audio into discrete codes, then trained Transformers on those codes, conditioned on artist, genre, and lyrics [3]. Outputs were lo-fi and structurally meandering, but they sang.

### When did text-to-music become practical? Foundation models for audio (2023 to 2024)

2023 was the year text-to-music caught up to text-to-image. Google researchers published [MusicLM](/wiki/musiclm) on arXiv in January 2023. The model cast music generation as hierarchical sequence-to-sequence prediction, building on AudioLM for generation and MuLan for joint music-text embeddings, and produced 24 kHz audio that held coherent over multiple minutes; it was trained on a dataset of about 280,000 hours of music, which Google cited as a reason for not releasing it publicly at first [4][53]. In June 2023, Meta released [MusicGen](/wiki/musicgen) and the broader AudioCraft toolkit, which combined the EnCodec neural audio codec with a single-stage autoregressive Transformer and shipped open weights; Meta released the code on 8 June 2023 under the MIT license, with model weights under a Creative Commons Attribution-NonCommercial 4.0 license [5][54]. Stability AI launched the commercial product [Stable Audio](/wiki/stable_audio) in September 2023, using a latent diffusion architecture similar to Stable Diffusion but trained on audio [9]. Riffusion, released in December 2022 by Seth Forsgren and Hayk Martiros, took a different route: it fine-tuned Stable Diffusion on spectrogram images, generating audio by converting model outputs back through an inverse Fourier transform [10].

The explosion in consumer tools followed. Suno, founded in Cambridge, Massachusetts by Michael Shulman, Georg Kucsko, Martin Camacho, and Keenan Freyberg, made its app widely available in December 2023, shipped V3 on 21 March 2024, and V4 on 19 November 2024 [33]. The company later shipped V5 on 23 September 2025 and V5.5, which added a singing-voice capture feature, on 26 March 2026 [55]. [Udio](/wiki/udio) launched in beta on 10 April 2024, founded by former [DeepMind](/wiki/deepmind) researchers David Ding, Conor Durkan, Charlie Nash, Yaroslav Ganin, and Andrew Sanchez, with a $10 million seed round led by Andreessen Horowitz [35]. Google DeepMind and YouTube announced [Lyria](/wiki/lyria) and the Dream Track experiment on 16 November 2023, along with the YouTube Music AI Incubator [8].

## Which AI music generation models and tools are most notable?

| Tool | Developer | Released | Approach | Notes |
| --- | --- | --- | --- | --- |
| EMI / Emmy | David Cope | 1981 | Rule-based pattern analysis | Generated pieces in the style of Bach, Mozart, Beethoven |
| Iamus | Universidad de Málaga | 2010 | Evolutionary algorithm (Melomics) | London Symphony Orchestra recorded its work in 2012 |
| LSTM Blues | Eck and Schmidhuber, IDSIA | 2002 | LSTM RNN on 12-bar blues | First widely cited deep learning composition paper |
| [Magenta](/wiki/magenta) | Google Brain | 2016 | Various neural models | Open-source umbrella project |
| [NSynth](/wiki/nsynth) | Magenta and DeepMind | 2017 | WaveNet autoencoder | Public 300,000-note dataset |
| [MuseNet](/wiki/musenet) | OpenAI | 2019 | Sparse Transformer | 4-minute MIDI compositions |
| [Jukebox](/wiki/jukebox) | OpenAI | 2020 | VQ-VAE plus Transformers | Raw-audio singing in artist styles |
| Riffusion | Forsgren and Martiros | 2022 | Spectrogram diffusion | Fine-tune of Stable Diffusion |
| [MusicLM](/wiki/musiclm) | Google | 2023 | Hierarchical sequence model | Built on AudioLM and MuLan; ~280,000 hours of training audio |
| [MusicGen](/wiki/musicgen) | Meta (AudioCraft) | 2023 | EnCodec plus single-stage Transformer | Open weights (CC-BY-NC 4.0), code MIT |
| [Stable Audio](/wiki/stable_audio) | Stability AI | 2023 | Latent diffusion | Stable Audio 2.0 added 3-minute songs in April 2024 |
| [Lyria](/wiki/lyria) | Google DeepMind | 2023 | Proprietary; SynthID watermark | Powers YouTube Dream Track |
| [Suno](/wiki/suno) | Suno, Inc. | 2023 | Proprietary text-to-song | V3 Mar 2024, V4 Nov 2024, V5 Sep 2025 |
| [Udio](/wiki/udio) | Uncharted Labs | 2024 | Proprietary text-to-song | Founded by ex-DeepMind researchers |
| AIVA | Aiva Technologies | 2016 | Symbolic composition | First AI registered as a composer at SACEM |
| Boomy | Boomy Corp. | 2018 | One-click generation plus distribution | Backed by Warner Music; 80% creator royalty share |
| Endel | Endel Sound GmbH | 2018 | Generative ambient soundscapes | First algorithm signed by a major label (Warner, 2019) |
| Soundful | Soundful | 2019 | Royalty-free template generator | Marketed to creators |
| Mubert | Mubert Inc. | 2016 | Real-time generative streams | Pivoted to text-to-music in 2022 |
| Audiobox | Meta | 2023 | Voice and sound generation | Successor to Voicebox and AudioGen |

Many of these systems are now positioned as creative tools rather than autonomous composers, with marketing language emphasizing assistance to human songwriters. The actual line between the two depends on how the platform handles prompting, editing, and ownership.

## How does AI voice cloning work in music?

Voice cloning is the technical thread that runs through most of the legal and ethical debate. Modern systems can produce a usable singing voice from minutes of training audio, sometimes less. The same techniques that let Spotify build a personalized DJ from Xavier Jernigan's voice also let anonymous TikTok users impersonate Drake.

### Spotify AI DJ

Spotify rolled out its AI DJ in the United States and Canada on 22 February 2023 [28]. The product mixes [recommendation algorithms](/wiki/recommendation_systems) with a synthetic voice cloned from Xavier "X" Jernigan, the company's head of cultural partnerships and a former host of Spotify's Get Up morning show. Jernigan's voice was modeled using technology from Sonantic, a voice startup Spotify had acquired in June 2022 and whose work included the Val Kilmer voice in *Top Gun: Maverick* [29]. The team isolated Jernigan's audio from roughly 300 episodes of The Get Up to train pitch, pacing, and emotion. The DJ launched in 50-plus markets through 2023 and 2024 and was rolled out in Spanish in mid-2024.

### Holly Herndon's Holly+

Composer [Holly Herndon](/wiki/holly_herndon) and her collaborator Mathew Dryhurst built one of the earliest artist-trained voice models, named Spawn, which featured on her 2019 album *PROTO*. In July 2021, working with Voctro Labs, Herndon launched Holly+, a public tool that converts uploaded audio into her voice [36]. Holly+ creations carry an open license that allows non-commercial release, with commercial use governed by a DAO of stewards. Herndon framed the project as a model for consensual deepfakes and decentralized identity, an alternative to either outright bans or unrestricted scraping.

### Grimes and Elf.tech

On 23 April 2023, the musician [Grimes](/wiki/grimes) announced on Twitter that anyone could use an AI clone of her voice and would split streaming royalties 50/50 with her if they distributed the result [37]. The voiceprint, called GrimesAI-1, was trained on her vocals and made available through Elf.tech, a CreateSafe product. In late April 2023, Grimes and CreateSafe partnered with TuneCore to handle distribution and royalty splits for tracks featuring the GrimesAI-1 voiceprint [38]. By embracing a permissive licensing model, Grimes positioned herself in deliberate contrast to the major-label "opt out" stance.

## Notable industry incidents

### Heart on My Sleeve

The most-discussed AI music incident of 2023 was *Heart on My Sleeve*, a song produced by the anonymous TikTok user Ghostwriter977 (sometimes written ghostwriter977) using AI-cloned vocals of Drake and The Weeknd. Ghostwriter977 self-released the track on streaming platforms including Spotify, Apple Music, SoundCloud, Amazon Music, Deezer, YouTube, and Tidal on 4 April 2023, and posted a one-minute snippet to TikTok on 15 April. The first TikTok video drew about 9.4 million views [11].

Universal Music Group filed DMCA takedown notices on 17 April, and the song was pulled from streaming services within days [12]. Ghostwriter977 later submitted the track for Grammy consideration. On 4 September 2023, Recording Academy CEO Harvey Mason Jr. confirmed that *Heart on My Sleeve* would not be eligible because, despite featuring AI vocals of artists signed to UMG, it had not been commercially released through legitimate distribution channels [13]. The Recording Academy's broader AI rule, announced in June 2023, requires that any Grammy-winning music have meaningful human authorship [32].

### Now and Then by the Beatles

Not all uses of AI in music are adversarial. On 2 November 2023, Apple Records released *Now and Then*, marketed as the final Beatles song [14]. The track was built from a late-1970s demo cassette John Lennon had recorded in his New York apartment at the Dakota. Earlier attempts to finish the song in 1995 had failed because the Lennon vocal was buried under a domestic piano track and could not be separated cleanly with the era's tools.

For the 2023 version, producer Giles Martin and Paul McCartney used the same machine-learning audio demixing technology that Peter Jackson's team had developed for the *Get Back* documentary. The model could distinguish Lennon's voice from the piano and isolate it as a clean stem [15]. Martin then added new string arrangements, ELO's Jeff Lynne polished George Harrison's archival guitar parts, and Ringo Starr added new drums. McCartney publicly stressed that no AI vocal generation was used and that the singing was Lennon's actual voice, recovered rather than synthesized. The song debuted at number 1 in the United Kingdom on 10 November 2023, the band's first UK number 1 in 54 years, and entered the Billboard Hot 100 at number 7 in the United States [16].

### Michael Smith streaming fraud indictment

On 4 September 2024, federal prosecutors in the Southern District of New York unsealed an indictment against Michael Smith, a 52-year-old musician from Cornelius, North Carolina. Prosecutors charged Smith with wire fraud, conspiracy to commit wire fraud, and money laundering in what they described as the first criminal AI music streaming-fraud case brought in the United States [17]. According to the indictment, Smith ran the scheme from roughly 2017 through 2024.

The alleged mechanics were straightforward. Smith bought hundreds of thousands of AI-generated tracks from a co-conspirator, uploaded them to Spotify, Apple Music, Amazon Music, and YouTube Music under thousands of bot-controlled artist names, and used automated programs to stream those tracks. The indictment alleges he generated up to 661,440 streams per day and collected more than $10 million in royalties [18]. Smith pleaded not guilty initially. In 2025, he entered a guilty plea to a single conspiracy count, with Billboard reporting that the scheme returned approximately $8 million to him before detection [19].

The case sharpened streaming platforms' incentive to identify and demote machine-generated tracks. Deezer began publicly reporting the share of new uploads it flagged as fully AI-generated. By 2025, Deezer said roughly 28 percent of new uploads it received were entirely AI-generated; by April 2026 the company put the figure at 44 percent, or about 75,000 tracks per day, and in June 2026 began licensing its detection stack to other platforms [42][43][52].

## Major lawsuits: RIAA v. Suno and Udio

On 24 June 2024, the Recording Industry Association of America announced two coordinated copyright infringement lawsuits on behalf of [Universal Music Group](/wiki/umg), Sony Music Entertainment, and Warner Records [20]. The suit against Suno was filed in the United States District Court for the District of Massachusetts; the suit against Uncharted Labs, the company behind Udio, was filed in the Southern District of New York the same day. The complaints alleged that Suno and Udio had trained their commercial music generators on copyrighted sound recordings without licenses, on a "massive scale," and that the resulting outputs could be reverse-engineered to recreate close imitations of specific copyrighted recordings [20].

The RIAA's evidence in both complaints centered on prompt-based exhibits. By feeding the services prompts describing the genre, era, instrumentation, and lyrical themes of famous recordings, plaintiffs said they could elicit outputs that closely resembled songs including Mariah Carey's *All I Want for Christmas Is You*, the Temptations' *My Girl*, and Green Day's *American Idiot*. The plaintiffs sought statutory damages of up to $150,000 per infringed work, plus injunctive relief [20].

Suno's chief executive Mikey Shulman responded in a 1 August 2024 blog post and court filing arguing that training on copyrighted music constituted [fair use](/wiki/fair_use) because the model produces new, transformative outputs rather than copies [21]. Udio took a similar fair-use position. The cases progressed slowly through 2024 and 2025.

The story did not end at trial. On 29 October 2025, Universal Music Group announced a settlement with Udio [22]. UMG chairman and CEO Sir Lucian Grainge said the agreements demonstrate "our commitment to do what's right by our artists and songwriters, whether that means embracing new technologies, developing new business models, diversifying revenue streams or beyond," while Udio CEO Andrew Sanchez framed the deal as "uniting AI and the music industry in a way that truly champions artists" [49]. As part of the settlement, Udio agreed to pivot its product from open generation toward a licensed "fan engagement platform" where users could remix and prompt using a UMG-approved catalog, with a jointly licensed music-creation service planned for 2026 [49]. In November 2025, Warner Music Group settled with Suno and signed a separate licensing partnership; the deal also covered Suno's plans for new licensed models in 2026 and gave Warner artists control over use of their names, voices, and likenesses [23][50]. Warner also struck a deal with Udio. As of late 2025, UMG and Sony's cases against Suno remained active, with public reports describing talks as stalled.

## Legislation: ELVIS Act and NO FAKES Act

### Is AI voice cloning illegal? The Tennessee ELVIS Act

Tennessee Governor Bill Lee signed the Ensuring Likeness Voice and Image Security Act, known as the [ELVIS Act](/wiki/elvis_act), into law on 21 March 2024 [24]. The act updated Tennessee's existing right-of-publicity statute to add an explicit protection for voice, including AI-generated voice clones. It made unauthorized commercial cloning of an artist's voice a Class A misdemeanor and gave artists, labels, and licensees a civil cause of action. The law took effect 1 July 2024 and made Tennessee the first US state to enact AI-specific voice protection. The bill passed unanimously: 93-0 in the House and 30-0 in the Senate, with support from the Recording Academy and the RIAA [25].

### NO FAKES Act

At the federal level, Senators Chris Coons (D-DE), Marsha Blackburn (R-TN), Amy Klobuchar (D-MN), and Thom Tillis (R-NC) released a discussion draft of the Nurture Originals, Foster Art, and Keep Entertainment Safe Act (NO FAKES Act) in October 2023, then formally introduced the bill on 31 July 2024 as S. 4875 in the 118th Congress [26][27]. The bill would create a federal right of action against the unauthorized production, hosting, or distribution of a digital replica of an individual's voice or likeness, including replicas generated by AI. It includes carve-outs for protected First Amendment uses such as documentary, biographical, parody, and news reporting, and it requires online services to remove an unauthorized replica on notice from a rights holder.

The bill did not pass in the 118th Congress and was reintroduced in 2025 with bipartisan and bicameral support.

## Which AI features have streaming services added?

The consumer-facing AI features at the big streamers tell a story about where each platform sees the technology fitting.

| Platform | Feature | Launched | Notes |
| --- | --- | --- | --- |
| Spotify | AI DJ | February 2023 | Voice cloned from Xavier Jernigan; Sonantic tech |
| Spotify | AI Playlist (text prompts) | 2024 | Available in select markets |
| YouTube Music | Dream Track | November 2023 | Lyria-powered, opt-in artists including John Legend, Charlie Puth, Sia |
| YouTube | Music AI tools | 2023 to ongoing | Output from Music AI Incubator |
| Apple Music | Sing (karaoke); enhanced lyrics | 2022 to 2024 | AI for audio separation |
| Amazon Music | Endel partnership | 2023 | Personalized soundscapes |
| Deezer | AI track detector | 2025 | Licensed ACRCloud fingerprint plus internal classifiers |

Deezer's posture stands out. The company chose to label AI-generated uploads and exclude them from algorithmic recommendations and editorial playlists, and in June 2026 it began selling its detection tool to other services [43][52]. Spotify's stance has been more lenient, attracting criticism for slow action on machine-generated tracks suspected of streaming fraud.

## Notable artist deployments

Not every artist has resisted generative AI. Several have built their own systems or licensed their voices on terms they set.

- **[Holly Herndon](/wiki/holly_herndon)**: Spawn (2019) on the album *PROTO*; Holly+ (2021) for public, non-commercial use [36].
- **[Grimes](/wiki/grimes)**: GrimesAI-1 via Elf.tech (April 2023); 50/50 royalty share for distributed tracks; TuneCore distribution partnership [37][38].
- **[Imogen Heap](/wiki/imogen_heap)**: Has experimented with the Mogen voice agent and broader Creative Passport / Mycelia digital identity work since the late 2010s.
- **YouTube Dream Track artists**: Alec Benjamin, Charlie Puth, Charli XCX, Demi Lovato, John Legend, Papoose, Sia, T-Pain, and Troye Sivan agreed to have their voices modeled by [Lyria](/wiki/lyria) for the November 2023 experiment [8].
- **Beatles**: Used AI audio demixing rather than AI vocal generation on *Now and Then* (November 2023). The vocal is John Lennon's original 1970s recording, separated from the piano by machine learning [15].
- **Endel and Grimes (2020)**: A collaboration that produced AI Lullaby, a generative sleep soundscape featuring Grimes's vocals.

## Label and industry policy

Universal Music Group's CEO Sir Lucian Grainge has been the most vocal industry executive on AI. UMG's public position, articulated in 2023 and reiterated in subsequent annual memos, can be summarized in a single sentence Grainge used in company communications: UMG will not license any model that uses an artist's voice or generates new songs that incorporate an artist's existing songs without the artist's consent [31]. UMG has nonetheless signed agreements with YouTube, TikTok, Meta, BandLab, Soundlabs, KLAY, ProRata, and, after the 2025 settlement, Udio [49].

Sony Music Entertainment took a more aggressive posture in May 2024 by sending a formal opt-out letter to more than 700 AI developers and platforms [30]. The letter, addressed to recipients including OpenAI, Microsoft, and Google, declared that Sony's recordings, compositions, lyrics, artwork, and data could not be used for text or data mining or to train AI systems without explicit advance permission. The letter cited the European Union's [AI Act](/wiki/ai_act) and its disclosure requirements as part of the motivation.

Warner Music Group has straddled these positions. It has been an early investor in Boomy, signed a distribution deal with Endel in January 2019 (the first major-label deal for an algorithm), settled with Suno and Udio in November 2025, and signed Warner artists into licensed AI collaborations [46][23].

The Recording Academy updated its Grammy eligibility rules in June 2023 to require meaningful human authorship in any winning entry [32]. AI-only compositions are ineligible in songwriting categories, and AI-only performances are ineligible in performance categories, but AI-assisted works with significant human contribution can compete.

## How do AI music models work? Technical methods

AI music systems use a handful of recurring building blocks.

**Symbolic models** work on MIDI or piano-roll representations, predicting note sequences. EMI, MuseNet, and many Magenta models fall in this family [6].

**Raw-audio autoregressive models** like WaveNet (the predecessor to NSynth) and Jukebox predict audio samples or compressed tokens directly. They produce convincing timbre and singing but are computationally expensive [3].

**Neural audio codecs** such as EnCodec (Meta) and SoundStream (Google) compress 32 to 48 kHz audio into discrete tokens at 50 to 75 Hz. Modern systems including MusicGen and MusicLM train Transformers on those tokens rather than raw samples, cutting compute by orders of magnitude [5].

**Diffusion models** including Riffusion and Stable Audio start from noise and iteratively denoise toward a target. Riffusion did this on spectrogram images; Stable Audio works in a learned latent audio space [9][10].

**Joint music-text embeddings** such as MuLan (Google) and CLAP (LAION) learn a shared space where matched audio and text descriptions cluster together. They make text conditioning possible at scale [4].

**Voice conversion and cloning** systems range from cycle-consistent GANs and so-vits-svc style models to modern diffusion and flow-matching approaches. They can be trained from a few minutes of clean vocal audio and have driven both legitimate artist projects (Holly+, GrimesAI-1) and the bulk of the unauthorized clones that have triggered legislation.

**Stem separation** uses source-separation models (Demucs, Spleeter) to pull a mix apart into vocals, drums, bass, and other instruments. This was the technology used on the Beatles' *Now and Then* [15].

**Speech transcription** with models such as [Whisper](/wiki/whisper) underpins lyric extraction and search.

## Cultural and labor debate

The arguments here are unusually concrete because they map onto actual paychecks. Working musicians and the unions that represent them, including the American Federation of Musicians and SAG-AFTRA, have raised three concerns repeatedly.

The first is training-data consent. Models such as Suno and Udio were trained on large catalogs of recorded music, and until the 2025 settlements, the providers declined to disclose what was in their training sets [20]. The fair-use defense that AI companies have offered echoes arguments made in parallel litigation over LLMs and image models, but musicians point out that recorded music has unusually strong property protection in US copyright law.

The second is royalty dilution. If 28 to 44 percent of new tracks on a streaming service are AI-generated and a non-trivial fraction of those are uploaded by bots or fraud schemes, the per-stream royalty pool gets diluted for human artists [42]. On Deezer, AI-generated music made up only 1 to 3 percent of total streams in 2026, but about 85 percent of those streams were flagged as fraudulent [52]. Spotify shifted to a minimum-stream threshold in 2024 partly in response to this pressure.

The third is voice impersonation and consent. The same technology that lets Grimes voluntarily license her voice for a 50 percent royalty share also lets anonymous TikTok users impersonate a Drake or a Tom Petty without permission [37]. The ELVIS Act and NO FAKES Act both target this asymmetry [24][26].

Defenders of the technology, including some artists, point out that AI is doing what samplers, drum machines, autotune, and DAWs have done before: lowering the floor on what a single person can produce in a bedroom studio. They argue that the cultural moral panic over AI music echoes earlier panics over sampling and over file sharing, and that artists who learn to work with AI will be in a stronger position than artists who try to wish it away. Grimes has been the most prominent voice for this position; Herndon's Holly+ is a more careful third path.

The industry's own behavior suggests the answer will not be a clean ban or a clean embrace but a slow negotiation over licensing terms, training-data disclosure, royalty splits, and detection. The June 2024 lawsuits, the October and November 2025 settlements, and the parallel rise of detection technology at Deezer and ACRCloud all point in the same direction: AI music is being absorbed into the existing royalty plumbing on terms acceptable to the major labels, while independent musicians fight a separate set of battles over voice, training, and pay [22][23][52].

## See also

- [Music ChatGPT Plugins](/wiki/music_chatgpt_plugins)
- [Generative AI](/wiki/generative_ai)
- [Voice cloning](/wiki/voice_cloning)
- [Deepfake](/wiki/deepfake)
- [Suno](/wiki/suno)
- [Udio](/wiki/udio)
- [Magenta](/wiki/magenta)
- [Jukebox](/wiki/jukebox)
- [MusicLM](/wiki/musiclm)
- [Stable Audio](/wiki/stable_audio)
- [Lyria](/wiki/lyria)
- [Whisper](/wiki/whisper)
- [Grimes](/wiki/grimes)
- [Holly Herndon](/wiki/holly_herndon)
- [DeepMind](/wiki/deepmind)
- [Google Brain](/wiki/google_brain)
- [Meta](/wiki/meta)
- [Stability AI](/wiki/stability_ai)

## References

1. Eck, D. and Schmidhuber, J. (2002). *Finding Temporal Structure in Music: Blues Improvisation with LSTM Recurrent Networks*. IDSIA Technical Report IDSIA-07-02.
2. Engel, J. et al. (2017). *Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders*. arXiv:1704.01279.
3. Dhariwal, P. et al. (2020). *Jukebox: A Generative Model for Music*. arXiv:2005.00341.
4. Agostinelli, A. et al. (2023). *MusicLM: Generating Music From Text*. arXiv:2301.11325.
5. Copet, J. et al. (2023). *Simple and Controllable Music Generation* (MusicGen / AudioCraft). Meta AI.
6. OpenAI. *MuseNet*. 25 April 2019.
7. Google Magenta team. *Welcome to Magenta!* TensorFlow blog, 1 June 2016.
8. Google DeepMind. *Transforming the Future of Music Creation* (Lyria, Dream Track). 16 November 2023.
9. Stability AI. *Introducing Stable Audio* (September 2023); *Introducing Stable Audio 2.0* (April 2024).
10. Riffusion model card on Hugging Face; *Riffusion* on Wikipedia.
11. Wikipedia, *Heart on My Sleeve (Ghostwriter977 song)*.
12. CNN, *The viral new 'Drake' and 'Weeknd' song is not what it seems*, 19 April 2023.
13. Variety, *Ghostwriter's Heart on My Sleeve... Submitted for Grammys*, September 2023.
14. Wikipedia, *Now and Then (Beatles song)*.
15. NPR, *How producers used AI to finish The Beatles' last song, Now and Then*, 2 November 2023.
16. Official Charts (UK). *Now and Then* number 1 chart entry, 10 November 2023.
17. United States Attorney's Office, Southern District of New York. *North Carolina Musician Charged With Music Streaming Fraud Aided by Artificial Intelligence*, 4 September 2024.
18. Fortune, *A musician siphoned $10 million in royalties...*, 5 September 2024.
19. Billboard, *Feds Score Guilty Plea in First-Ever U.S. Streaming Fraud Case*, 2025.
20. RIAA. *Record Companies Bring Landmark Cases for Responsible AI Against Suno and Udio*. 24 June 2024.
21. TechCrunch, *AI music startup Suno claims training model on copyrighted music is 'fair use'*, 1 August 2024.
22. Music Ally, *UMG settles Udio lawsuit; companies plan new AI-music service together*, 30 October 2025.
23. Music Business Worldwide, *Warner Music Group strikes 'landmark' deal with Suno*, November 2025.
24. Wikipedia, *ELVIS Act*. Signed 21 March 2024; effective 1 July 2024.
25. NPR, *Tennessee becomes the first state to protect musicians and other artists against AI*, 22 March 2024.
26. Senator Chris Coons. *Senators Coons, Blackburn, Klobuchar, Tillis introduce bill to protect individuals' voices and likenesses from AI-generated replicas* (NO FAKES Act). 31 July 2024.
27. S. 4875, 118th Congress (2023 to 2024), NO FAKES Act of 2024. Congress.gov.
28. Spotify Newsroom. *Spotify Debuts a New AI DJ, Right in Your Pocket*. 22 February 2023.
29. TechCrunch, *Xavier X Jernigan, the voice of Spotify's DJ, explains what it's like to become an AI*, 21 April 2023.
30. Music Business Worldwide, *Sony Music sends letters to 700 AI, music streaming companies declaring it's opting out of AI training*, May 2024.
31. Music Business Worldwide, *Sir Lucian Grainge on UMG's AI policy: We will NOT license AI models that use an artist's voice without their consent*.
32. CBS News, *New Grammy rule addresses artificial intelligence, says only human creators eligible for awards*, June 2023.
33. Wikipedia, *Suno (platform)*.
34. Wikipedia, *Udio*.
35. VentureBeat, *Former Google DeepMind researchers launch AI-powered music creation app Udio*, 10 April 2024.
36. Scientific American, *Experimental composer Holly Herndon built an AI voice clone that anyone can use*.
37. Billboard, *Grimes Launches A.I. Vocal Project In Beta: 'Enjoy the Chaos'*, 24 April 2023.
38. TuneCore press release on the Elf.Tech partnership, April 2023.
39. Computer History Museum, *Algorithmic Music: David Cope and EMI*.
40. Wikipedia, *Iamus (computer)*; *Iamus (album)*.
41. iannis-xenakis.org, *Stochastic Music* and *Stochastic Synthesis* pages.
42. Music Business Worldwide, *75,000 AI-generated tracks now flood Deezer daily, representing 44% of all new music uploaded to the platform*, April 2026.
43. Deezer Newsroom, *How to Detect AI Music: Deezer Sells Its Detection Tool*, January 2026.
44. Wikipedia, *AIVA*.
45. Business Wire, *Boomy Launches Revolutionary AI Music Technology to Empower a New Generation of Social Music Creators*, 12 May 2021.
46. Wikipedia, *Endel (app)*; Music Business Worldwide, *Warner's Spinnin' Records partners with Endel*.
47. TechCrunch, *Legally embattled AI music startup Suno raises at $2.45B valuation on $200M revenue*, 19 November 2025.
48. Variety, *AI Music Company Suno Raises $400 Million at $5.4 Billion Valuation*, 2026.
49. PR Newswire / Universal Music Group and Udio. *Universal Music Group and Udio Announce Udio's First Strategic Agreements for New Licensed AI Music Creation Platform*; The Hollywood Reporter, *Universal Music Group Announces Settlement With Udio*, 30 October 2025.
50. Music Ally, *AI-music firm Suno strikes first licensing deal with Warner Music Group*, 25 November 2025.
51. Suno. *Model Timeline & Information* (V3, V4, V5, V5.5 release dates), help.suno.com.
52. Deezer Newsroom, *AI-generated tracks now represent 44% of all new uploaded music*, 20 April 2026; TechCrunch, *Deezer's new tool can identify AI music from Spotify, Apple Music, and others*, 11 June 2026.
53. Google Research, *MusicLM: Generating Music From Text* (project page noting ~280,000 hours of training audio).
54. Hugging Face, *facebook/musicgen-melody* and *MusicGen* model documentation (released 8 June 2023; code MIT, weights CC-BY-NC 4.0).
55. Suno. *When Was Suno V5 Released? Roadmap & Features* (V5 on 23 September 2025; V5.5 on 26 March 2026).