Music
Last reviewed
May 13, 2026
Sources
46 citations
Review status
Source-backed
Revision
v2 · 4,904 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 13, 2026
Sources
46 citations
Review status
Source-backed
Revision
v2 · 4,904 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Music ChatGPT Plugins
AI in music refers to the use of artificial intelligence, particularly machine learning and generative AI, to compose, perform, mix, master, transcribe, and reproduce music. The field stretches back to mid-twentieth-century algorithmic composition and accelerated dramatically after 2016, when Google Brain launched the open-source Magenta project. By the mid-2020s, dedicated music generators such as Suno and Udio could produce full songs with vocals from short text prompts, triggering a wave of copyright lawsuits, state and federal legislation, and an industry-wide debate about consent, training data, and the future of recorded music.
AI touches almost every layer of the modern music stack. On the creative side, generative models can produce instrumental beds, full songs with synthesized vocals, stem separations, and style transfers. On the operations side, AI handles mastering (LANDR, iZotope), transcription (Whisper), playlist personalization, voice synthesis, and copyright detection. Streaming platforms such as Spotify, YouTube Music, and Deezer have built consumer-facing AI features that range from voice-cloned DJs to AI-track detectors that police royalty fraud.
The industry's relationship to AI has been openly contradictory. The same major labels that filed landmark infringement suits against Suno and Udio in June 2024 had, by late 2025, begun settling those cases and signing licensing partnerships with the same companies they had accused of mass infringement. Working musicians, songwriters, and union members have been more uniformly skeptical, citing concerns about scraped training data, voice deepfakes, and royalty dilution from machine-generated tracks flooding streaming platforms.
Long before neural networks, composers experimented with rule-based and probabilistic systems. The Greek-French composer Iannis Xenakis applied probability theory to composition starting in the mid-1950s; his treatise Musiques formelles (1963) described stochastic procedures that he later implemented in his ST computer program, used to compose works including ST/4 and ST/10. American composer David Cope began writing his Experiments in Musical Intelligence (EMI, pronounced "Emmy") in 1981 while procrastinating an opera commission. EMI analyzed a composer's catalog and produced new pieces in that style. In a now-famous Turing-style test organized by Douglas Hofstadter, audience members at the University of Oregon identified an EMI Bach pastiche as the real Bach piece and a human composer's piece as the machine output.
In 2002, Douglas Eck and Jürgen Schmidhuber at IDSIA published Finding Temporal Structure in Music: Blues Improvisation with LSTM Recurrent Networks, demonstrating that long short-term memory networks could learn 12-bar blues structure and improvise novel melodies that respected the chord progression. The paper is widely cited as the moment when deep learning entered serious music research.
In Spain, the Iamus computer cluster at the Universidad de Málaga produced its Opus one on 15 October 2010, described as the first fragment of professional contemporary classical music composed by a machine in its own style. Iamus's first full work, Hello World!, premiered exactly one year later. In 2012, the London Symphony Orchestra recorded an album of Iamus pieces, which New Scientist called the first complete album composed solely by a computer and recorded by human musicians.
Google Brain announced Magenta on 1 June 2016 with a question: can machines make music and art? The project produced a stream of open-source models and datasets through the late 2010s, including the WaveNet-based NSynth (April 2017), MusicVAE for melody interpolation, Music Transformer, and a Magenta Studio plugin suite for Ableton Live. The NSynth paper, Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders, was a collaboration between Brain and DeepMind and shipped with a public dataset of more than 300,000 instrument notes.
OpenAI entered the field with MuseNet, announced on 25 April 2019. MuseNet was a 72-layer Sparse Transformer that could generate four-minute compositions across ten instruments and many styles, trained on MIDI from sources including ClassicalArchives and BitMidi. A year later, on 30 April 2020, OpenAI released Jukebox, an autoregressive model that worked directly on raw audio. Jukebox used a hierarchy of VQ-VAEs to compress audio into discrete codes, then trained Transformers on those codes, conditioned on artist, genre, and lyrics. Outputs were lo-fi and structurally meandering, but they sang.
2023 was the year text-to-music caught up to text-to-image. Google researchers published MusicLM on arXiv in January 2023. The model cast music generation as hierarchical sequence-to-sequence prediction, building on AudioLM for generation and MuLan for joint music-text embeddings, and produced 24 kHz audio that held coherent over multiple minutes. In June 2023, Meta released MusicGen and the broader AudioCraft toolkit, which combined the EnCodec neural audio codec with a single-stage autoregressive Transformer and shipped open weights. Stability AI launched the commercial product Stable Audio in September 2023, using a latent diffusion architecture similar to Stable Diffusion but trained on audio. Riffusion, released in December 2022 by Seth Forsgren and Hayk Martiros, took a different route: it fine-tuned Stable Diffusion on spectrogram images, generating audio by converting model outputs back through an inverse Fourier transform.
The explosion in consumer tools followed. Suno, founded in Cambridge, Massachusetts by Michael Shulman, Georg Kucsko, Martin Camacho, and Keenan Freyberg, made its app widely available in December 2023, shipped V3 on 21 March 2024, and V4 on 19 November 2024. Udio launched in beta on 10 April 2024, founded by former DeepMind researchers David Ding, Conor Durkan, Charlie Nash, Yaroslav Ganin, and Andrew Sanchez, with a $10 million seed round led by Andreessen Horowitz. Google DeepMind and YouTube announced Lyria and the Dream Track experiment on 16 November 2023, along with the YouTube Music AI Incubator.
| Tool | Developer | Released | Approach | Notes |
|---|---|---|---|---|
| EMI / Emmy | David Cope | 1981 | Rule-based pattern analysis | Generated pieces in the style of Bach, Mozart, Beethoven |
| Iamus | Universidad de Málaga | 2010 | Evolutionary algorithm (Melomics) | London Symphony Orchestra recorded its work in 2012 |
| LSTM Blues | Eck and Schmidhuber, IDSIA | 2002 | LSTM RNN on 12-bar blues | First widely cited deep learning composition paper |
| Magenta | Google Brain | 2016 | Various neural models | Open-source umbrella project |
| NSynth | Magenta and DeepMind | 2017 | WaveNet autoencoder | Public 300,000-note dataset |
| MuseNet | OpenAI | 2019 | Sparse Transformer | 4-minute MIDI compositions |
| Jukebox | OpenAI | 2020 | VQ-VAE plus Transformers | Raw-audio singing in artist styles |
| Riffusion | Forsgren and Martiros | 2022 | Spectrogram diffusion | Fine-tune of Stable Diffusion |
| MusicLM | 2023 | Hierarchical sequence model | Built on AudioLM and MuLan | |
| MusicGen | Meta (AudioCraft) | 2023 | EnCodec plus single-stage Transformer | Open weights |
| Stable Audio | Stability AI | 2023 | Latent diffusion | Stable Audio 2.0 added 3-minute songs in April 2024 |
| Lyria | Google DeepMind | 2023 | Proprietary; SynthID watermark | Powers YouTube Dream Track |
| Suno | Suno, Inc. | 2023 | Proprietary text-to-song | V3 March 2024, V4 November 2024 |
| Udio | Uncharted Labs | 2024 | Proprietary text-to-song | Founded by ex-DeepMind researchers |
| AIVA | Aiva Technologies | 2016 | Symbolic composition | First AI registered as a composer at SACEM |
| Boomy | Boomy Corp. | 2018 | One-click generation plus distribution | Backed by Warner Music; 80% creator royalty share |
| Endel | Endel Sound GmbH | 2018 | Generative ambient soundscapes | First algorithm signed by a major label (Warner, 2019) |
| Soundful | Soundful | 2019 | Royalty-free template generator | Marketed to creators |
| Mubert | Mubert Inc. | 2016 | Real-time generative streams | Pivoted to text-to-music in 2022 |
| Audiobox | Meta | 2023 | Voice and sound generation | Successor to Voicebox and AudioGen |
Many of these systems are now positioned as creative tools rather than autonomous composers, with marketing language emphasizing assistance to human songwriters. The actual line between the two depends on how the platform handles prompting, editing, and ownership.
Voice cloning is the technical thread that runs through most of the legal and ethical debate. Modern systems can produce a usable singing voice from minutes of training audio, sometimes less. The same techniques that let Spotify build a personalized DJ from Xavier Jernigan's voice also let anonymous TikTok users impersonate Drake.
Spotify rolled out its AI DJ in the United States and Canada on 22 February 2023. The product mixes recommendation algorithms with a synthetic voice cloned from Xavier "X" Jernigan, the company's head of cultural partnerships and a former host of Spotify's Get Up morning show. Jernigan's voice was modeled using technology from Sonantic, a voice startup Spotify had acquired in June 2022 and whose work included the Val Kilmer voice in Top Gun: Maverick. The team isolated Jernigan's audio from roughly 300 episodes of The Get Up to train pitch, pacing, and emotion. The DJ launched in 50-plus markets through 2023 and 2024 and was rolled out in Spanish in mid-2024.
Composer Holly Herndon and her collaborator Mathew Dryhurst built one of the earliest artist-trained voice models, named Spawn, which featured on her 2019 album PROTO. In July 2021, working with Voctro Labs, Herndon launched Holly+, a public tool that converts uploaded audio into her voice. Holly+ creations carry an open license that allows non-commercial release, with commercial use governed by a DAO of stewards. Herndon framed the project as a model for consensual deepfakes and decentralized identity, an alternative to either outright bans or unrestricted scraping.
On 23 April 2023, the musician Grimes announced on Twitter that anyone could use an AI clone of her voice and would split streaming royalties 50/50 with her if they distributed the result. The voiceprint, called GrimesAI-1, was trained on her vocals and made available through Elf.tech, a CreateSafe product. In late April 2023, Grimes and CreateSafe partnered with TuneCore to handle distribution and royalty splits for tracks featuring the GrimesAI-1 voiceprint. By embracing a permissive licensing model, Grimes positioned herself in deliberate contrast to the major-label "opt out" stance.
The most-discussed AI music incident of 2023 was Heart on My Sleeve, a song produced by the anonymous TikTok user Ghostwriter977 (sometimes written ghostwriter977) using AI-cloned vocals of Drake and The Weeknd. Ghostwriter977 self-released the track on streaming platforms including Spotify, Apple Music, SoundCloud, Amazon Music, Deezer, YouTube, and Tidal on 4 April 2023, and posted a one-minute snippet to TikTok on 15 April. The first TikTok video drew about 9.4 million views.
Universal Music Group filed DMCA takedown notices on 17 April, and the song was pulled from streaming services within days. Ghostwriter977 later submitted the track for Grammy consideration. On 4 September 2023, Recording Academy CEO Harvey Mason Jr. confirmed that Heart on My Sleeve would not be eligible because, despite featuring AI vocals of artists signed to UMG, it had not been commercially released through legitimate distribution channels. The Recording Academy's broader AI rule, announced in June 2023, requires that any Grammy-winning music have meaningful human authorship.
Not all uses of AI in music are adversarial. On 2 November 2023, Apple Records released Now and Then, marketed as the final Beatles song. The track was built from a late-1970s demo cassette John Lennon had recorded in his New York apartment at the Dakota. Earlier attempts to finish the song in 1995 had failed because the Lennon vocal was buried under a domestic piano track and could not be separated cleanly with the era's tools.
For the 2023 version, producer Giles Martin and Paul McCartney used the same machine-learning audio demixing technology that Peter Jackson's team had developed for the Get Back documentary. The model could distinguish Lennon's voice from the piano and isolate it as a clean stem. Martin then added new string arrangements, ELO's Jeff Lynne polished George Harrison's archival guitar parts, and Ringo Starr added new drums. McCartney publicly stressed that no AI vocal generation was used and that the singing was Lennon's actual voice, recovered rather than synthesized. The song debuted at number 1 in the United Kingdom on 10 November 2023, the band's first UK number 1 in 54 years, and entered the Billboard Hot 100 at number 7 in the United States.
On 4 September 2024, federal prosecutors in the Southern District of New York unsealed an indictment against Michael Smith, a 52-year-old musician from Cornelius, North Carolina. Prosecutors charged Smith with wire fraud, conspiracy to commit wire fraud, and money laundering in what they described as the first criminal AI music streaming-fraud case brought in the United States. According to the indictment, Smith ran the scheme from roughly 2017 through 2024.
The alleged mechanics were straightforward. Smith bought hundreds of thousands of AI-generated tracks from a co-conspirator, uploaded them to Spotify, Apple Music, Amazon Music, and YouTube Music under thousands of bot-controlled artist names, and used automated programs to stream those tracks. The indictment alleges he generated up to 661,440 streams per day and collected more than $10 million in royalties. Smith pleaded not guilty initially. In 2025, he entered a guilty plea to a single conspiracy count, with Billboard reporting that the scheme returned approximately $8 million to him before detection.
The case sharpened streaming platforms' incentive to identify and demote machine-generated tracks. Deezer began publicly reporting the share of new uploads it flagged as fully AI-generated. By 2025, Deezer said roughly 28 percent of new uploads it received were entirely AI-generated; by April 2026 the company put the figure at 44 percent, or about 75,000 tracks per day, and began licensing its detection stack to other platforms.
On 24 June 2024, the Recording Industry Association of America announced two coordinated copyright infringement lawsuits on behalf of Universal Music Group, Sony Music Entertainment, and Warner Records. The suit against Suno was filed in the United States District Court for the District of Massachusetts; the suit against Uncharted Labs, the company behind Udio, was filed in the Southern District of New York the same day. The complaints alleged that Suno and Udio had trained their commercial music generators on copyrighted sound recordings without licenses, on a "massive scale," and that the resulting outputs could be reverse-engineered to recreate close imitations of specific copyrighted recordings.
The RIAA's evidence in both complaints centered on prompt-based exhibits. By feeding the services prompts describing the genre, era, instrumentation, and lyrical themes of famous recordings, plaintiffs said they could elicit outputs that closely resembled songs including Mariah Carey's All I Want for Christmas Is You, the Temptations' My Girl, and Green Day's American Idiot. The plaintiffs sought statutory damages of up to $150,000 per infringed work, plus injunctive relief.
Suno's chief executive Mikey Shulman responded in a 1 August 2024 blog post and court filing arguing that training on copyrighted music constituted fair use because the model produces new, transformative outputs rather than copies. Udio took a similar fair-use position. The cases progressed slowly through 2024 and 2025.
The story did not end at trial. On 29 October 2025, Universal Music Group announced a settlement with Udio. As part of the settlement, Udio agreed to pivot its product from open generation toward a licensed "fan engagement platform" where users could remix and prompt using a UMG-approved catalog, with a jointly licensed music-creation service planned for 2026. In November 2025, Warner Music Group settled with Suno and signed a separate licensing partnership; Warner also struck a deal with Udio. As of late 2025, UMG and Sony's cases against Suno remained active, with public reports describing talks as stalled.
Tennessee Governor Bill Lee signed the Ensuring Likeness Voice and Image Security Act, known as the ELVIS Act, into law on 21 March 2024. The act updated Tennessee's existing right-of-publicity statute to add an explicit protection for voice, including AI-generated voice clones. It made unauthorized commercial cloning of an artist's voice a Class A misdemeanor and gave artists, labels, and licensees a civil cause of action. The law took effect 1 July 2024 and made Tennessee the first US state to enact AI-specific voice protection. The bill passed unanimously: 93-0 in the House and 30-0 in the Senate, with support from the Recording Academy and the RIAA.
At the federal level, Senators Chris Coons (D-DE), Marsha Blackburn (R-TN), Amy Klobuchar (D-MN), and Thom Tillis (R-NC) released a discussion draft of the Nurture Originals, Foster Art, and Keep Entertainment Safe Act (NO FAKES Act) in October 2023, then formally introduced the bill on 31 July 2024 as S. 4875 in the 118th Congress. The bill would create a federal right of action against the unauthorized production, hosting, or distribution of a digital replica of an individual's voice or likeness, including replicas generated by AI. It includes carve-outs for protected First Amendment uses such as documentary, biographical, parody, and news reporting, and it requires online services to remove an unauthorized replica on notice from a rights holder.
The bill did not pass in the 118th Congress and was reintroduced in 2025 with bipartisan and bicameral support.
The consumer-facing AI features at the big streamers tell a story about where each platform sees the technology fitting.
| Platform | Feature | Launched | Notes |
|---|---|---|---|
| Spotify | AI DJ | February 2023 | Voice cloned from Xavier Jernigan; Sonantic tech |
| Spotify | AI Playlist (text prompts) | 2024 | Available in select markets |
| YouTube Music | Dream Track | November 2023 | Lyria-powered, opt-in artists including John Legend, Charlie Puth, Sia |
| YouTube | Music AI tools | 2023 to ongoing | Output from Music AI Incubator |
| Apple Music | Sing (karaoke); enhanced lyrics | 2022 to 2024 | AI for audio separation |
| Amazon Music | Endel partnership | 2023 | Personalized soundscapes |
| Deezer | AI track detector | 2025 | Licensed ACRCloud fingerprint plus internal classifiers |
Deezer's posture stands out. The company chose to label AI-generated uploads and exclude them from algorithmic recommendations and editorial playlists. Spotify's stance has been more lenient, attracting criticism for slow action on machine-generated tracks suspected of streaming fraud.
Not every artist has resisted generative AI. Several have built their own systems or licensed their voices on terms they set.
Universal Music Group's CEO Sir Lucian Grainge has been the most vocal industry executive on AI. UMG's public position, articulated in 2023 and reiterated in subsequent annual memos, can be summarized in a single sentence Grainge used in company communications: UMG will not license any model that uses an artist's voice or generates new songs that incorporate an artist's existing songs without the artist's consent. UMG has nonetheless signed agreements with YouTube, TikTok, Meta, BandLab, Soundlabs, KLAY, ProRata, and, after the 2025 settlement, Udio.
Sony Music Entertainment took a more aggressive posture in May 2024 by sending a formal opt-out letter to more than 700 AI developers and platforms. The letter, addressed to recipients including OpenAI, Microsoft, and Google, declared that Sony's recordings, compositions, lyrics, artwork, and data could not be used for text or data mining or to train AI systems without explicit advance permission. The letter cited the European Union's AI Act and its disclosure requirements as part of the motivation.
Warner Music Group has straddled these positions. It has been an early investor in Boomy, signed a distribution deal with Endel in January 2019 (the first major-label deal for an algorithm), settled with Suno and Udio in November 2025, and signed Warner artists into licensed AI collaborations.
The Recording Academy updated its Grammy eligibility rules in June 2023 to require meaningful human authorship in any winning entry. AI-only compositions are ineligible in songwriting categories, and AI-only performances are ineligible in performance categories, but AI-assisted works with significant human contribution can compete.
AI music systems use a handful of recurring building blocks.
Symbolic models work on MIDI or piano-roll representations, predicting note sequences. EMI, MuseNet, and many Magenta models fall in this family.
Raw-audio autoregressive models like WaveNet (the predecessor to NSynth) and Jukebox predict audio samples or compressed tokens directly. They produce convincing timbre and singing but are computationally expensive.
Neural audio codecs such as EnCodec (Meta) and SoundStream (Google) compress 32 to 48 kHz audio into discrete tokens at 50 to 75 Hz. Modern systems including MusicGen and MusicLM train Transformers on those tokens rather than raw samples, cutting compute by orders of magnitude.
Diffusion models including Riffusion and Stable Audio start from noise and iteratively denoise toward a target. Riffusion did this on spectrogram images; Stable Audio works in a learned latent audio space.
Joint music-text embeddings such as MuLan (Google) and CLAP (LAION) learn a shared space where matched audio and text descriptions cluster together. They make text conditioning possible at scale.
Voice conversion and cloning systems range from cycle-consistent GANs and so-vits-svc style models to modern diffusion and flow-matching approaches. They can be trained from a few minutes of clean vocal audio and have driven both legitimate artist projects (Holly+, GrimesAI-1) and the bulk of the unauthorized clones that have triggered legislation.
Stem separation uses source-separation models (Demucs, Spleeter) to pull a mix apart into vocals, drums, bass, and other instruments. This was the technology used on the Beatles' Now and Then.
Speech transcription with models such as Whisper underpins lyric extraction and search.
The arguments here are unusually concrete because they map onto actual paychecks. Working musicians and the unions that represent them, including the American Federation of Musicians and SAG-AFTRA, have raised three concerns repeatedly.
The first is training-data consent. Models such as Suno and Udio were trained on large catalogs of recorded music, and until the 2025 settlements, the providers declined to disclose what was in their training sets. The fair-use defense that AI companies have offered echoes arguments made in parallel litigation over LLMs and image models, but musicians point out that recorded music has unusually strong property protection in US copyright law.
The second is royalty dilution. If 28 to 44 percent of new tracks on a streaming service are AI-generated and a non-trivial fraction of those are uploaded by bots or fraud schemes, the per-stream royalty pool gets diluted for human artists. Spotify shifted to a minimum-stream threshold in 2024 partly in response to this pressure.
The third is voice impersonation and consent. The same technology that lets Grimes voluntarily license her voice for a 50 percent royalty share also lets anonymous TikTok users impersonate a Drake or a Tom Petty without permission. The ELVIS Act and NO FAKES Act both target this asymmetry.
Defenders of the technology, including some artists, point out that AI is doing what samplers, drum machines, autotune, and DAWs have done before: lowering the floor on what a single person can produce in a bedroom studio. They argue that the cultural moral panic over AI music echoes earlier panics over sampling and over file sharing, and that artists who learn to work with AI will be in a stronger position than artists who try to wish it away. Grimes has been the most prominent voice for this position; Herndon's Holly+ is a more careful third path.
The industry's own behavior suggests the answer will not be a clean ban or a clean embrace but a slow negotiation over licensing terms, training-data disclosure, royalty splits, and detection. The June 2024 lawsuits, the October and November 2025 settlements, and the parallel rise of detection technology at Deezer and ACRCloud all point in the same direction: AI music is being absorbed into the existing royalty plumbing on terms acceptable to the major labels, while independent musicians fight a separate set of battles over voice, training, and pay.