Lyria is a family of music generation models developed by Google DeepMind, spanning text-to-music synthesis, real-time interactive music performance, and full-length song composition. First announced in November 2023 alongside YouTube's Dream Track experiment, the Lyria family has grown through several major releases: Lyria 2 (2025), Lyria RealTime (2025), Lyria 3 (February 2026), and Lyria 3 Pro (March 2026). All outputs from the models are embedded with SynthID, an imperceptible audio watermark developed by DeepMind to identify AI-generated content.
Across its versions, Lyria supports a range of access points: consumer-facing generation through the Gemini app, developer API access via the Gemini API and Google AI Studio, and enterprise deployment through Vertex AI. The models generate audio at 48 kHz stereo, and the Pro variant extends output duration to three minutes with structural control over song sections.
Before Lyria, Google's music AI research produced MusicLM, a text-to-music model introduced in January 2023. MusicLM could generate up to five minutes of audio from descriptive text prompts and demonstrated that large language model techniques could be applied to music generation at high fidelity. The system became publicly available in May 2023 through Google's AI Test Kitchen. MusicLM's architecture, particularly its hierarchical approach to audio token generation, directly influenced the design of the Lyria models.
In parallel, Google's Magenta research group had been developing tools for creative music generation since 2016. Magenta explored neural network approaches to MIDI generation, style transfer, and accompaniment creation. This body of work fed into the Music AI Sandbox, an experimental set of professional tools that Google shared with musicians through YouTube's Music AI Incubator starting in 2023.
Google DeepMind was formed in April 2023 through the merger of Google Brain and DeepMind. Lyria became the consolidated team's flagship music AI product, building on MusicLM's foundation while introducing a product-grade system with watermarking, safety filters, and commercial access.
Google DeepMind announced the original Lyria model on November 16, 2023, in partnership with YouTube. The launch included two applications: Dream Track for YouTube Shorts and a set of Music AI Sandbox tools for professional musicians.
Dream Track was an experiment in YouTube Shorts that let a limited set of creators generate 30-second soundtracks by entering a topic and selecting from a carousel of participating artists. The model simultaneously generated lyrics, a backing track, and an AI-generated voice in the style of the chosen artist. Participating artists at launch included Charlie Puth, T-Pain, Demi Lovato, John Legend, Sia, Alec Benjamin, Charli XCX, Troye Sivan, and Papoose. Artist participation was opt-in, and the feature was designed in collaboration with YouTube's Music AI Incubator.
Alongside Dream Track, Google released Music AI Sandbox, a set of experimental tools aimed at working musicians. The Sandbox included three core capabilities:
Initial access was limited to participants in YouTube's Music AI Incubator. Google expanded Sandbox access to more musicians, producers, and songwriters in the United States in April 2025, at which point the platform had been updated to use Lyria 2 and Lyria RealTime as its underlying models.
From the original announcement, every output from Lyria was embedded with SynthID. Google described this as a non-negotiable component of deployment rather than an optional feature. The watermarking approach converts generated audio to a spectrogram representation, embeds a signal within the frequency domain using psychoacoustic principles, and reconstructs the audio. The resulting watermark survives MP3 and AAC compression, speed changes, and noise addition.
Google announced Lyria 2 at Google Cloud Next 2025 and made it generally available on Vertex AI on May 21, 2025. The model ID is lyria-002.
Lyria 2 generates 30-second audio clips at 48 kHz stereo, delivered as WAV files. It accepts text prompts and supports negative prompting to exclude unwanted musical elements from the output. Users can request up to four variations from a single prompt. The model offers greater control over instruments, BPM, and musical style compared to the original Lyria.
The model was used in YouTube's "Speech to Song" tool, which converted spoken audio into sung performances. It also powered the updated Music AI Sandbox, providing the high-fidelity audio quality that the professional music tool required.
Lyria 2 is available through Media Studio on Vertex AI, where it can be tested interactively, and through the Vertex AI model API for programmatic access. The API endpoint accepts POST requests with text prompt parameters and returns audio data. Pricing on Vertex AI is $0.06 per 30 seconds of generated output.
The quota limit at launch was 10 requests per minute per base model for standard access. All outputs are embedded with SynthID watermarking and pass through configurable safety filters on both input prompts and output audio.
Lyria RealTime is a separate model variant designed for continuous, interactive music generation rather than batch clip production. Google announced the Lyria RealTime API at Google I/O 2025 on May 20, 2025, opening it to developers via the Gemini API and AI Studio.
Lyria RealTime generates an uninterrupted stream of 48 kHz stereo audio that never stops unless the user terminates it. Rather than producing a finished clip, the model works block-by-block: each sequential chunk of audio is conditioned on both the previous audio output and a style embedding derived from the user's current text or audio prompts. By adjusting that style embedding in real time, a user can steer the music as it plays.
The architecture adapts MusicLM's approach to perform block autoregression. The decoder uses two connected Transformer modules: a temporal module that processes acoustic frames by embedding and aggregating RVQ (Residual Vector Quantization) tokens into frame-level representations, and a depth module that autoregressively predicts RVQ indices conditioned on the temporal context. Audio representation uses SpectroStream, a high-fidelity codec that succeeded SoundStream and supports 48 kHz stereo output. A refinement model predicts 48 additional fine-scale RVQ tokens per frame to improve audio quality.
For text and audio prompts, the model uses MusicCoCa, a joint music-text embedding model influenced by both MuLan (an earlier Google music embedding model) and CoCa (a contrastive captioning architecture). Style embeddings are computed as weighted averages of text and audio prompt embeddings, enabling blending across multiple prompts simultaneously.
The system targets a maximum latency of two seconds between a control change and its audible effect. On an H100 GPU, the architecture achieves a real-time factor of 1.8, meaning it generates audio faster than playback speed. This real-time factor is necessary to sustain continuous streaming while also accepting and processing incoming control changes.
Developers accessing Lyria RealTime through the API can adjust:
The consumer-facing product built on Lyria RealTime is MusicFX DJ, available through Google Labs. The application presents a visual interface for blending text prompts in real time. At Google I/O 2025, musician Toro y Moi (Chaz Bear) used a MIDI controller-based interface built on Lyria RealTime to perform live before the conference keynote, demonstrating the model's suitability for live performance contexts. A technical report on the Lyria RealTime architecture was published to arXiv (arXiv:2508.04651).
Google launched Lyria 3 in the Gemini app on February 18, 2026. The release marked the first time Lyria was available directly to general consumers without requiring a developer API key or Vertex AI account.
Lyria 3 generates 30-second tracks from text prompts or from uploaded images and videos. The model adds auto-generated lyrics to vocal tracks, infers musical style from image content, and provides greater control over tempo, vocal style, and instrumentation compared to Lyria 2. The output is described as musically complex and continuous, without the stitching artifacts that affected shorter earlier models.
Lyria 3 is available to users aged 18 and older. The initial rollout covers eight languages: English, German, Spanish, French, Hindi, Japanese, Korean, and Portuguese, with an Arabic beta launched simultaneously. Free Gemini users receive baseline generation limits. Paid tiers provide higher daily limits: Gemini AI Plus ($19.99/month, 10 tracks per day), Gemini Pro ($29.99/month, 20 tracks per day), and Gemini Ultra ($99.99/month, 50 tracks per day). All generated tracks are embedded with SynthID watermarking, and users can verify whether an audio file was created using Google's tools through a companion detection feature.
Lyria 3 also became available through Google Vids (Google's AI video editing tool) for automatically generating background music tracks for video projects. ProducerAI, a music production service Google had acquired, integrated Lyria 3 as its underlying generation engine.
Google launched Lyria 3 Pro on March 25, 2026, approximately five weeks after the standard Lyria 3 release. The Pro variant extends output duration to three minutes and adds structural awareness over song composition.
Lyria 3 Pro allows users to specify distinct song sections in their prompts: intros, verses, choruses, bridges, and outros. The model respects these structural specifications when composing, producing tracks with coherent energy progression and transitions between sections. This makes it suited to producing complete song-length compositions rather than brief clips.
Beyond structural control, Lyria 3 Pro accepts user-supplied lyrics alongside or in place of auto-generated ones, supports instrumental-only generation, and handles reference images as a secondary input modality.
In the Gemini app, Lyria 3 Pro track generation is restricted to paid subscribers. The model is also available through the Gemini API and Google AI Studio for developers, and through Vertex AI in public preview for enterprise use. API pricing works out to roughly $0.08 per generated track. Google Workspace customers on AI Pro and Ultra plans can access Lyria 3 Pro through Google Vids. The model supports the C2PA (Coalition for Content Provenance and Authenticity) content credentials standard in addition to SynthID watermarking.
Google stated at launch that Lyria 3 Pro was trained using data from its licensed partners and permissible data from YouTube and Google. This followed criticism of the earlier Lyria 2, which reportedly used copyrighted recordings without explicit authorization.
Google DeepMind has not published a full architectural paper for the Lyria family. The available technical information comes from the DeepMind blog, Vertex AI documentation, and the Lyria RealTime paper on arXiv.
Across the Lyria family, audio is represented using SpectroStream, a neural audio codec developed by Google as a successor to SoundStream. SpectroStream encodes audio into RVQ token sequences that can represent 48 kHz stereo content. The codec supports both the batch generation mode used by Lyria 2 and the streaming mode used by Lyria RealTime.
MusicCoCa is the joint music-text embedding model used for conditioning. It was trained by Google to align text descriptions with musical audio, taking inspiration from the MuLan music-language model and the CoCa vision-language architecture. MusicCoCa produces embeddings that capture style, genre, instrumentation, and mood, which the generation models use as conditioning signals.
For batch models (Lyria 2, Lyria 3, Lyria 3 Pro), the generation process takes a text or image prompt, encodes it through MusicCoCa, and generates a sequence of SpectroStream tokens conditioned on that embedding. The decoder uses a Transformer architecture with cross-attention to the conditioning signal.
Lyria RealTime adapts this to streaming by using block autoregression: the model generates audio in sequential chunks, each conditioned on both the style embedding and the audio tokens produced in the previous chunk. This causal generation approach enables online inference with bounded latency.
All Lyria models apply safety filtering at two stages: on input prompts and on output audio. The filters block prompts requesting content that reproduces specific copyrighted melodies, generates audio in the likeness of specific real artists' voices (outside of explicitly consented Dream Track partnerships), or produces content that violates Google's usage policies. Output filters perform a similarity check against reference audio to reduce the likelihood of outputs that closely reproduce protected recordings.
SynthID is Google DeepMind's technology for embedding imperceptible watermarks into AI-generated media, including audio, images, video, and text. For audio outputs from the Lyria family, SynthID works as follows:
The generated audio waveform is converted into a spectrogram, a two-dimensional representation showing how frequency content evolves over time. SynthID embeds a watermark signal into the spectrogram using psychoacoustic masking principles, placing information in frequency ranges and temporal segments where human hearing is least sensitive. The modified spectrogram is then converted back to a waveform.
The resulting watermark is inaudible in normal listening conditions. It survives common audio processing operations including MP3 and AAC compression, speed changes, pitch shifting, noise addition, and basic mixing. A separate SynthID detector model can analyze an audio clip and classify it as definitely AI-generated, possibly AI-generated, or not detected as AI-generated.
SynthID was first deployed for images through Imagen on Vertex AI in August 2023. The audio version was introduced alongside Lyria in November 2023. The technology has since been extended to video (for Veo outputs) and to text through a different approach based on biasing token selection during generation.
All Lyria outputs, including those from Dream Track, MusicFX DJ, the Gemini app, and the Vertex AI API, are watermarked with SynthID automatically. There is no option to disable watermarking for Lyria-generated audio.
Lyria operates in a market that also includes Suno and Udio as well as Stable Audio from Stability AI. The three platforms take different approaches to music generation and target somewhat different users.
| Feature | Lyria 3 Pro | Suno | Udio |
|---|---|---|---|
| Developer | Google DeepMind | Suno AI | Uncharted Labs |
| Max output length | 3 minutes | ~8 minutes | ~8 minutes |
| Vocal generation | Yes | Yes (strongest) | Yes |
| Instrumental only | Yes | Yes | Yes |
| Structural control | Yes (sections) | Partial | Limited |
| Real-time generation | Yes (RealTime) | No | No |
| Image-to-music | Yes | No | No |
| Audio watermarking | SynthID (mandatory) | None | None |
| Training data claims | Licensed + YouTube | Disputed | Disputed |
| API access | Gemini API, Vertex AI | Yes | Yes |
| Free tier | Limited (Gemini) | Yes | Yes |
| Pricing (entry paid) | $19.99/month | ~$10/month | ~$10/month |
On pure audio fidelity, reviewers in 2026 generally rate Lyria 3 as technically polished, with realistic instrumental timbres and clean dynamic range. Suno is rated higher for vocal expressiveness and song writing coherence, particularly in rock, folk, and pop genres. Udio performs well for electronic and hip-hop production, with fast generation times of roughly 15 to 30 seconds per track compared to Suno's 30 to 60 seconds.
Lyria's distinctive strengths are its real-time generation mode, structural control in the Pro variant, and mandatory SynthID watermarking. The ecosystem integration with Gemini, YouTube, and Google Workspace creates natural deployment paths that independent platforms cannot replicate.
On pricing, Suno and Udio both offer entry-paid plans around $10 per month with roughly 500 tracks per month. Lyria's paid access through Gemini starts higher at $19.99 per month, though AI Studio allows free testing. The Vertex AI pricing of $0.06 per 30 seconds for Lyria 2 and roughly $0.08 per track for Lyria 3 Pro on the API is competitive for enterprise use.
Suno and Udio both face active copyright litigation from major record labels (Sony Music, Universal Music Group, and Warner Records filed suit against both platforms in June 2024). Lyria 3 Pro faces a separate class action filed by nine independent artists in March 2026, alleging that Google used at least 44 million clips and 280,000 hours of copyrighted music to train the model. The plaintiffs argue that Google had structural leverage to license the material through its Content ID system and YouTube's rights database but chose not to.
Lyria models are used across a range of creative and commercial applications:
Short-form content: Lyria 3 and Lyria 2 are suited for generating background music for social media videos, YouTube Shorts, and online advertisements. The 30-second output length matches platform-standard clip lengths.
Live performance: Lyria RealTime is designed for on-stage use, DJ sets, and interactive installations where music needs to respond continuously to performer input. The MIDI controller integration demonstrated at Google I/O 2025 shows how the API can be mapped to physical hardware.
Video production: The Google Vids integration allows video editors to generate music that matches the mood and content of their footage directly within their editing workflow, without switching to an external tool.
Game audio: Lyria RealTime's continuous streaming architecture is applicable to adaptive game soundtracks, where music needs to shift based on game state without audible gaps or transitions.
Professional music production: Music AI Sandbox offers working musicians a set of tools for generating initial ideas, extending compositions, and transforming existing audio, positioned as creative assistance rather than full automation.
Enterprise media: Vertex AI access allows media companies, advertising agencies, and application developers to integrate music generation into their own workflows through the API.
Lyria has several documented and observed limitations:
Output length: The standard Lyria 3 model caps output at 30 seconds, which limits it to short-form use cases. The 3-minute limit of Lyria 3 Pro remains shorter than competitors like Suno and Udio, which can produce tracks up to eight minutes long.
Vocal capability: Reviewers consistently rate Lyria's vocal generation as less expressive than Suno's. Vocal phrasing sounds controlled but lacks the emotional range that Suno achieves, particularly for genres that depend on dynamic vocal performance.
Edit and extend: Lyria 2 and Lyria 3 are generation-only models; they do not natively support extending or editing previously generated clips. This capability exists in Music AI Sandbox but is not available in the consumer Gemini app or the standard API. Lyria RealTime offers real-time steering but not post-generation editing.
Language support: At launch, Lyria 3 vocal generation covered eight languages. Coverage remains narrower than Suno, which supports a wider range of languages and regional music styles.
Geographic access: Consumer access through the Gemini app has geographic restrictions. Vertex AI access is subject to regional availability of Google Cloud services.
Reproducibility: Like most neural audio generation systems, Lyria does not guarantee identical outputs from the same prompt. The seed parameter in the API provides some control but does not fully pin outputs.
Copyright proximity: Google applies output similarity filters to reduce reproduction of protected material, but these filters are acknowledged as fallible. The class action lawsuit filed in March 2026 disputes the adequacy of these protections.