MuseNet

Generative AI Music & Audio Generation OpenAI

7 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

10 citations

Revision

v1 · 1,376 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

MuseNet is a deep neural network for symbolic music generation, announced by OpenAI on April 25, 2019. The model can generate musical compositions up to about four minutes long using as many as 10 different instruments, and it can blend styles ranging from classical composers such as Mozart and Chopin to bands such as the Beatles, as well as country, pop, and other genres. MuseNet was not explicitly programmed with any understanding of music theory; instead it discovered patterns of harmony, rhythm, and style by learning to predict the next token across hundreds of thousands of MIDI files. It used the same general-purpose, unsupervised next-token-prediction approach as GPT-2, implemented as a large Sparse Transformer.^[1]^[2]^[3]

Overview

OpenAI presented MuseNet as evidence that the general-purpose sequence-modeling technology behind its language models could be applied to other domains, in this case symbolic music. The system treats a piece of music as a sequence of discrete tokens, much as a language model treats text as a sequence of word or subword tokens, and learns to continue that sequence one token at a time.^[1]^[2]

Because it was trained purely by prediction rather than by encoding musical rules, MuseNet learned conventions of harmony and rhythm implicitly from data. OpenAI noted that this approach allowed the model to combine styles in unusual ways, for example generating a piece that begins in one composer's idiom and shifts toward another, or rendering a melody associated with one artist using the instrumentation of a different genre. The company also acknowledged limitations: when asked to combine clashing styles and instruments, such as pairing a Chopin-style piano piece with bass and drums, the model could produce odd or incoherent results because such combinations were rare or absent in the training data.^[1]^[2]

MuseNet was released as a research demonstration with an accompanying interactive web tool rather than as a commercial product or downloadable model.^[1]^[4]

How it works

MuseNet is built on the Sparse Transformer architecture, a more compute-efficient variant of the Transformer that OpenAI had introduced earlier in 2019. Using the recompute and optimized kernels of the Sparse Transformer, OpenAI trained a 72-layer network with 24 attention heads and full attention over a context of 4,096 tokens. The long context window is what allowed the model to learn and maintain long-term structure across a multi-minute composition.^[1]^[2]^[3]

The model was trained on a large collection of MIDI files. ClassicalArchives and BitMidi donated their MIDI collections for the project, and OpenAI supplemented these with other collections found online, including jazz, pop, African, Indian, and Arabic styles. To convert MIDI into a sequence the network could model, OpenAI experimented with several token encodings and found that the most effective scheme combined the pitch, volume, and instrument of each note into a single token, alongside tokens that advanced time. OpenAI also applied data augmentation during training, including transposing notes into different keys, adjusting volume and timing, and mixing token embeddings.^[1]^[2]^[3]

Beyond standard positional embeddings, MuseNet added several learned embeddings to help the model track musical context:

Embedding	Purpose
Positional	Standard Transformer embedding indicating a token's position in the sequence.
Timing	A learned embedding tracking the passage of time so that all notes sounding simultaneously share the same timing embedding.
Chord	An embedding added for each note within a chord, mimicking relative attention so the model can more easily relate a note to earlier notes in the same or previous chord.
Structural (part)	An embedding dividing the larger piece into 128 parts to indicate where a sample sits within the whole.
Structural (countdown)	A second structural encoding that counts down from 127 to 0 as the model approaches the end-of-piece token.

To steer generation, OpenAI created composer and instrumentation tokens that were prepended to each sample at training time, so the model learned to associate them with particular musical characteristics. At generation time, supplying these tokens (for example, a token for a given composer or a chosen lead instrument) biased the output toward the requested style or arrangement.^[1]^[2]

Capabilities

MuseNet's headline capabilities, as described by OpenAI, can be summarized as follows.^[1]^[2]^[5]

Aspect	Detail
Output format	Symbolic music (MIDI), not raw audio
Maximum length	About four minutes per composition
Instruments	Up to 10 different instruments in a single piece
Styles	Classical composers (for example Mozart, Chopin), bands (for example the Beatles), plus country, pop, jazz, and other genres
Generation	Unsupervised next-token prediction, optionally conditioned on composer and instrument tokens
Steering	Users could choose a composer or style, a starting set of notes, and instrumentation

The model could generate a piece from scratch given a chosen style, or continue a short musical snippet provided by a user, predicting how the piece might develop. Because it produced MIDI rather than audio, the output was rendered through software instruments, and it did not include vocals or the fine timbral detail of recorded sound.^[1]^[2]

The MuseNet demo and co-composer

Alongside the announcement, OpenAI published an interactive demonstration and made a MuseNet-powered co-composer tool available to the public for a limited time, through May 12, 2019.^[1]^[2]^[6]

The tool offered two modes:

Simple mode, in which users could listen to uncurated samples that MuseNet had already generated. Users could choose a composer or style and an optional starting point, and hear what the model produced without further intervention.
Advanced mode, which gave users more direct control to interact with the model, including generating new parts and steering the instruments and styles used in a piece.^[1]^[2]^[6]

To mark the launch, OpenAI ran an experimental concert on April 25, 2019, livestreamed from roughly 12:00 to 3:00 p.m. Pacific Time. The pieces played during the stream were generated directly by MuseNet with no human curation or filtering, and OpenAI stated that no one, including the company, had heard the samples before they were broadcast.^[7]^[8]

The interactive co-composer was a temporary prototype, and OpenAI did not maintain it as a permanent public service after the demonstration period. The original co-composer access ended in mid-May 2019, and the broader interactive tooling was not kept running long term as OpenAI's research focus shifted.^[6]^[2]

Reception and legacy

MuseNet attracted significant media attention as a demonstration that the unsupervised, large-scale Transformer approach pioneered for text could generalize to music. Technology outlets highlighted both the breadth of styles the model could imitate and its ability to maintain coherent structure over several minutes, while also noting that genre blends could sound disjointed and that the MIDI output lacked the realism of recorded audio.^[3]^[4]^[5]

MuseNet directly preceded OpenAI's Jukebox, unveiled in 2020, which represented a different and more ambitious approach to music generation. Whereas MuseNet operated on symbolic MIDI, Jukebox generated music as raw audio, including rudimentary singing, conditioned on genre, artist, and lyrics. OpenAI described its earlier MuseNet work on synthesizing music from large amounts of MIDI data as a precursor to Jukebox, and observers framed the shift from symbolic generation to raw-audio synthesis as an attempt to capture human voices and subtle timbres that symbolic systems cannot represent.^[9]^[10]

As a research artifact, MuseNet remains an early and frequently cited example of applying general-purpose sequence models to creative domains, illustrating the same underlying principle, predicting the next token in a long sequence, that OpenAI applied across text, code, and other modalities.^[1]^[2]

References

OpenAI, "MuseNet," OpenAI Blog, April 25, 2019. https://openai.com/index/musenet/ ↩
OpenAI, "MuseNet" (research page). https://cdn.production.openai.com/research/musenet ↩
NVIDIA Developer Blog, "OpenAI Releases MuseNet: AI Algorithm That Can Generate Music." https://developer.nvidia.com/blog/openai-releases-musenet-ai-algorithm-that-can-generate-music/ ↩
Packt, "OpenAI introduces MuseNet: A deep neural network for generating musical compositions." https://www.packtpub.com/en-us/learning/how-to-tutorials/openai-introduces-musenet-a-deep-neural-network-for-generating-musical-compositions/ ↩
Musically, "OpenAI launches MuseNet, the latest music-generating AI," April 26, 2019. https://musically.com/2019/04/26/openai-musenet-music-generating-ai/ ↩
Neurohive, "OpenAI's MuseNet AI Can Generate Novel 4-Minute Compositions." https://neurohive.io/en/news/openai-s-musenet-ai-can-generate-novel-4-minute-compositions/ ↩
OpenAI (@OpenAI), "MuseNet is playing an experimental concert from now until 3pm PT on livestream," X (Twitter), April 25, 2019. https://x.com/openai/status/1121492690825146368 ↩
Internet Archive, "2019-04-25 - Open AI Muse Net Concert." https://archive.org/details/20190425OpenAIMuseNetConcert ↩
OpenAI, "Jukebox," OpenAI Blog, April 30, 2020. https://openai.com/index/jukebox/ ↩
NVIDIA Developer Blog, "OpenAI's Jukebox Produces Music with Lyrics from Scratch." https://developer.nvidia.com/blog/openais-jukebox-produces-music-with-lyrics-from-scratch/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

Jukebox (OpenAI)Music

Overview

How it works

Capabilities

The MuseNet demo and co-composer

Reception and legacy

References

Improve this article

Related Articles

Jukebox (OpenAI)

Suno

Udio

Stable Audio

Lyria

Suno v5

What links here

Related Articles

Jukebox (OpenAI)

Suno

Udio

Stable Audio

Lyria

Suno v5

What links here