# Stable Audio 2.5

> Source: https://aiwiki.ai/wiki/stable_audio_2_5
> Updated: 2026-06-28
> Categories: AI Models, Generative AI, Music & Audio Generation, Speech & Audio AI
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**Stable Audio 2.5** is an enterprise focused text-to-audio generation model released by [Stability AI](/wiki/stability_ai) on September 10, 2025. It generates production ready music tracks of up to three minutes at 44.1 kHz stereo from a text prompt, and it adds audio-to-audio transformation, audio inpainting, and custom fine-tuning on a customer's own licensed audio library. Stability AI describes it as "the first audio generation model designed specifically for enterprise-grade sound production," the third numbered iteration of the [Stable Audio](/wiki/stable_audio) family and the first version aimed at brand and commercial sound work rather than consumer experimentation.[^1][^2]

The headline technical change relative to earlier Stable Audio releases is generation speed. Stability AI reports that Stable Audio 2.5 produces a three minute track in under two seconds on an Nvidia H100 GPU, using an eight step diffusion process trained with a post-training method the company calls Adversarial Relativistic-Contrastive (ARC). Stable Audio 2.0 required roughly 50 inference steps to reach comparable quality, so the 2.5 release cuts the sampling budget by more than six times.[^1][^3]

The launch was accompanied by a partnership with sound branding agency amp, part of Landor Group within the WPP holding company, which made Stable Audio 2.5 available to WPP's enterprise clients through the WPP Open platform.[^1][^2]

## What is Stable Audio 2.5?

Stable Audio 2.5 is a latent diffusion text-to-audio model that turns a written description into a finished instrumental music track. A prompt such as "uplifting cinematic strings with a driving percussion bed" returns a stereo track of up to three minutes, structured with an explicit intro, development, and outro rather than a single repeating loop. Beyond plain text-to-audio it supports two editing workflows: audio-to-audio, where a reference clip plus a prompt is reshaped into a new style, and audio inpainting, where the model fills or extends a marked section of an uploaded track while keeping the surrounding material intact.[^1][^3]

The model is positioned for enterprise sound production. Stability AI markets the training data as fully licensed and the output as cleared for commercial use under the relevant license tier, and it offers organizations the ability to fine-tune a bespoke version of the model on their own sound library so that a brand's signature audio is embedded directly in the generative workflow.[^1][^2]

## Background

[Stability AI](/wiki/stability_ai) is a London based generative AI company best known for the [Stable Diffusion](/wiki/stable_diffusion) family of image models. The company expanded into audio in September 2023 with the original [Stable Audio](/wiki/stable_audio) text-to-music system, which produced short clips of up to 90 seconds in 44.1 kHz stereo using a latent diffusion architecture.[^4]

The second generation, Stable Audio 2.0, launched on April 3, 2024. It extended generation length to three minutes, switched from a U-Net to a diffusion transformer (DiT) over a highly compressed autoencoder, and introduced audio-to-audio style transfer alongside text-to-audio generation. According to Stability AI, the 2.0 model was trained on a dataset of over 800,000 audio files licensed from the AudioSparx music library, with an artist opt-out provision, and the web app used Audible Magic content recognition technology to screen uploaded audio for copyrighted material. Stability AI made the 2.0 web app free at launch on StableAudio.com and added API access shortly afterward.[^4][^5]

In May 2025, between the 2.0 and 2.5 releases, Stability AI and Arm released Stable Audio Open Small, a 341 million parameter text-to-audio model optimized to run on Arm CPUs and generate short samples on-device. That model was the first public deployment of the ARC post-training method later used in Stable Audio 2.5, and it was released under the permissive Stability AI Community License.[^6]

Across the same period the generative music space became crowded. [Suno](/wiki/suno) and [Udio](/wiki/udio) both gained large consumer followings for full song generation with vocals. [ElevenLabs](/wiki/elevenlabs) entered the field in 2025 with [ElevenLabs Music](/wiki/elevenlabs_music). Stability AI's response was to specialize. Stable Audio 2.5 doubles down on instrumental music, sound design beds, and brand audio rather than chasing the pop song use case the consumer apps target.[^7][^8]

## What is Stable Audio 2.5 used for?

The model handles three core generation modes and several enterprise features that did not ship with the consumer focused 2.0 release.

| Capability | Description |
| --- | --- |
| Text-to-audio | Generates music from a natural language prompt. Stability AI says 2.5 responds more precisely to mood descriptors like "uplifting" and instrument-level prompts like "lush synthesizers" than 2.0. |
| Audio-to-audio | Accepts a reference clip plus a text prompt and transforms the input into a new track in a target style. Carried over from Stable Audio 2.0. |
| Audio inpainting | New in 2.5. Users upload a track, mark a start point, and the model generates a continuation that musically fits the existing material. Suited to fixing or extending parts of an existing arrangement. |
| Track length | Up to three minutes per generation, output at 44.1 kHz stereo. |
| Musical structure | Tracks include explicit intro, development, and outro sections rather than a single repeating loop. |
| Custom fine-tuning | Enterprise customers can fine-tune the model on their own audio library to produce a bespoke sonic identity. Sold as part of enterprise licensing. |
| Commercial safety | Trained on a fully licensed dataset. Stability AI advertises the output as cleared for commercial use, subject to the license tier. |

The inpainting feature is the most consequential addition for sound designers. Earlier audio diffusion tools tended to regenerate a whole clip whenever a section needed changing, which broke continuity. Inpainting lets a producer keep the parts that already work and only resynthesize what is wrong. In its announcement Stability AI described the workflow directly: "Stable Audio 2.5 supports audio inpainting, which means users can input their own audio, select where they want it to start, and the model will use the context to generate the rest of the track." The company framed the target use cases as brand sound work spanning everything from commercials and game intros to in-store music and audio cues for products such as credit cards and car stereos.[^1][^3]

## How fast is Stable Audio 2.5?

Speed is the defining claim of the release. Stability AI states that Stable Audio 2.5 reaches an "inference speed of less than two seconds on a GPU, for tracks up to three minutes," measured on an Nvidia H100, and that it does so in just eight diffusion steps rather than the roughly 50 steps used by Stable Audio 2.0. VentureBeat characterized the eight step pipeline as cutting brand audio production time "from weeks to minutes."[^1][^2]

| Metric | Stable Audio 2.5 | Stable Audio 2.0 |
| --- | --- | --- |
| Inference steps | About 8 | About 50 |
| Time for a 3 minute track | Under 2 seconds on H100 | Several seconds (slower) |
| Acceleration method | ARC adversarial post-training | Standard diffusion sampling |

The speed gain comes from the post-training stage, not from a smaller model. ARC compresses the number of sampling steps the model needs at inference time while preserving output quality, which is what lets a long stereo track render in roughly the time it takes to read the prompt.[^1][^9]

## How does the ARC architecture work?

Stability AI has disclosed two specific technical details about Stable Audio 2.5. The first is that the model is a latent diffusion system, consistent with the diffusion transformer over a compressed autoencoder used in Stable Audio 2.0. The second is the post-training method used to compress the sampling schedule.[^1][^4]

Adversarial Relativistic-Contrastive post-training, or ARC, is the technique Stability AI credits for the eight step inference budget. The method was introduced in the paper "Fast Text-to-Audio Generation with Adversarial Post-Training" (arXiv 2505.08175), submitted by Stability AI researchers in May 2025. The authors describe ARC as the first adversarial acceleration algorithm for diffusion and flow models that is not based on distillation: it extends a relativistic adversarial training formulation to diffusion and flow post-training and adds a contrastive discriminator objective intended to improve prompt adherence. Because it avoids distillation, the approach does not require a separate pretrained teacher model or classifier-free guidance. ARC was first applied to the open Stable Audio Open Small model, where Stability AI reported generating about 12 seconds of 44.1 kHz stereo audio in roughly 75 milliseconds on an H100 GPU, before the same method was used to accelerate Stable Audio 2.5.[^1][^9]

Reported inference time is under two seconds for a three minute output on an Nvidia H100. The company has not published parameter counts, training set size, or the encoder bottleneck rate for the 2.5 model.[^1][^3]

## How much does Stable Audio 2.5 cost?

Stable Audio 2.5 is available through several channels with different commercial terms.

| Channel | Notes |
| --- | --- |
| StableAudio.com | Web interface aimed at individuals. Free tier for personal use only. Paid Creator tier extends commercial rights to individuals earning under \$1 million per year. |
| Stability AI API | Pay-as-you-go credit pricing on the Stability AI Platform, where 1 credit equals \$0.01. New accounts start with 25 free credits. |
| fal, Replicate, ComfyUI | Third party hosting platforms with their own per-request pricing. |
| Enterprise license | Direct contract with Stability AI required for any organization with annual revenue above \$1 million, for API resellers, and for on-premises deployment. Includes implementation support, custom fine-tuning, and professional services. |

Stability AI markets the underlying training data as fully licensed and the outputs as commercially safe, which is the company's primary selling point against legally contested competitors. Uploaded reference audio still has to be copyright cleared by the user, and the platform runs content recognition checks on uploads to enforce this.[^1][^10][^11]

## How does Stable Audio 2.5 differ from Stable Audio 2.0 and Stable Audio Open?

The three products share a lineage but target different users. Stable Audio 2.0 (April 2024) is the prior consumer flagship; Stable Audio Open is the openly licensed line for local and on-device use; Stable Audio 2.5 (September 2025) is the enterprise tier.

| Feature | Stable Audio 2.5 | Stable Audio 2.0 | Stable Audio Open / Open Small |
| --- | --- | --- | --- |
| Released | September 2025 | April 2024 | 2024 (Open), May 2025 (Open Small) |
| Primary audience | Enterprise and brand sound | Individual creators | Researchers, on-device developers |
| Max track length | 3 minutes | 3 minutes | About 47 seconds (Open), short samples (Open Small) |
| Inference steps | About 8 (ARC) | About 50 | About 8 (Open Small, ARC) |
| Audio inpainting | Yes (new) | No | No |
| Custom fine-tuning service | Yes | No | Open weights, self fine-tune |
| Weights | Closed, hosted | Closed, hosted | Open under Stability AI Community License |

The practical differences from 2.0 are the more than six times faster generation via ARC, the new audio inpainting workflow, the enterprise fine-tuning service, and the explicit commercial-safety positioning. Stable Audio Open differs again by shipping open weights for self hosting, with Open Small being the small 341 million parameter on-device variant that first proved out the ARC method.[^1][^4][^6]

## How does Stable Audio 2.5 compare to Suno, Udio, and ElevenLabs?

Stable Audio 2.5 sits in an awkward part of the market. It is faster than the consumer music generators and is positioned for licensed enterprise work, but it does not generate vocals or full songs in the way the leading consumer systems do.

| Model | Vendor | Released | Vocals | Max length | Commercial license clarity |
| --- | --- | --- | --- | --- | --- |
| Stable Audio 2.5 | [Stability AI](/wiki/stability_ai) | September 2025 | No | 3 minutes | Trained on licensed data, commercial use under paid tiers |
| [Suno v5](/wiki/suno_v5) | Suno | 2025 | Yes | About 8 minutes per generation | Disputed, multiple major label lawsuits pending |
| [Udio](/wiki/udio) | Uncharted Labs | 2024 | Yes | Up to 15 minutes via extensions | Disputed, multiple major label lawsuits pending |
| [ElevenLabs Music](/wiki/elevenlabs_music) | ElevenLabs | 2025 | Yes | Up to 5 minutes | Trained on licensed data, commercial use under paid tiers |

Reviewers writing about the 2025 audio model landscape generally place Stable Audio in the instrumental and sound design category rather than the song writing category. Coverage from outlets like Geeky Gadgets and aicompetence describes Stable Audio as the strongest choice for cinematic beds, sample packs, loops, and sound effects, and Suno or Udio as the better picks for vocal led pop and hip hop.[^7][^8]

## Reception

Reception at launch focused on three things: the speed claim, the enterprise positioning, and the WPP partnership. VentureBeat called the eight step generation pipeline a "breakthrough" that cut audio production time from weeks to minutes for brand teams that previously commissioned bespoke sound design. The Decoder framed the release as Stability AI choosing to compete on speed and licensing safety rather than vocal generation. Winbuzzer emphasized the inpainting workflow as the most useful day-to-day addition for sound designers, since it removes the all-or-nothing regeneration problem of earlier tools.[^2][^3][^12]

Independent coverage of Stable Audio's quality has tended to praise the model on instrumental fidelity at 44.1 kHz stereo while flagging the absence of vocals as a real limitation for anyone trying to use it for full songs. The model has not, as of mid 2026, been widely benchmarked against [Suno v5](/wiki/suno_v5) or [ElevenLabs Music](/wiki/elevenlabs_music) on listener preference tests, partly because the systems target different output categories.[^7][^8]

The WPP partnership is the part of the launch that says the most about the strategy. By going through Landor and amp, Stability AI gets distribution into hundreds of brands that already buy sound design as a service. To justify that focus, Stability AI cited Ipsos research in its announcement, stating that custom audio can make a brand eight times more memorable, that audio influences brand engagement by 86 percent, and that only 6 percent of creative uses a distinct sound identity. That is a smaller market than consumer music apps, but it is one where licensing provenance and on-premises deployment are worth real money.[^1][^2]

## See also

- [Stable Audio](/wiki/stable_audio)
- [Stability AI](/wiki/stability_ai)
- [Stable Diffusion](/wiki/stable_diffusion)
- [Suno v5](/wiki/suno_v5)
- [Udio](/wiki/udio)
- [ElevenLabs Music](/wiki/elevenlabs_music)
- [Generative AI](/wiki/generative_ai)
- [Latent diffusion model](/wiki/latent_diffusion)

## References

[^1]: Stability AI. "Stability AI Introduces Stable Audio 2.5, the First Audio Model Built for Enterprise Sound Production at Scale." September 10, 2025. https://stability.ai/news-updates/stability-ai-introduces-stable-audio-25-the-first-audio-model-built-for-enterprise-sound-production-at-scale Accessed 2026-06-28.

[^2]: Michael Nuñez. "Stability AI's enterprise audio model cuts production time from weeks to minutes with 8-step generation breakthrough." VentureBeat, September 10, 2025. https://venturebeat.com/ai/stability-ais-enterprise-audio-model-cuts-production-time-from-weeks-to Accessed 2026-06-28.

[^3]: The Decoder. "Stability AI releases Stable Audio 2.5 for faster and more complex AI-generated music." September 2025. https://the-decoder.com/stability-ai-releases-stable-audio-2-5-for-faster-and-more-complex-ai-generated-music/ Accessed 2026-06-28.

[^4]: Stability AI. "Introducing Stable Audio 2.0." April 3, 2024. https://stability.ai/news-updates/stable-audio-2-0 Accessed 2026-06-28.

[^5]: Voicebot.ai. "Stability AI Releases Augmented Text-to-Music Engine Stable Audio 2 With Upload and Style Transfer Features." April 4, 2024. https://voicebot.ai/2024/04/04/stability-ai-releases-augmented-text-to-music-engine-stable-audio-2-with-upload-and-style-transfer-features/ Accessed 2026-06-28.

[^6]: Stability AI. "Stability AI and Arm Release Stable Audio Open Small, Enabling Real-World Deployment for On-Device Audio Generation." May 2025. https://stability.ai/news/stability-ai-and-arm-release-stable-audio-open-small-enabling-real-world-deployment-for-on-device-audio-control Accessed 2026-06-28.

[^7]: Geeky Gadgets. "AI music creators compared: Udio vs Suno vs Stable Audio vs Audio Shake." 2025. Accessed 2026-06-28.

[^8]: aicompetence.org. "AI Music Generation: Suno vs. Udio vs. Stable Audio." 2025. Accessed 2026-06-28.

[^9]: Zachary Novack et al. "Fast Text-to-Audio Generation with Adversarial Post-Training." arXiv:2505.08175, submitted May 13, 2025. https://arxiv.org/abs/2505.08175 Accessed 2026-06-28.

[^10]: Stability AI. "Stability AI License." https://stability.ai/license Accessed 2026-06-28.

[^11]: Stability AI. "Stability AI Developer Platform Pricing." https://platform.stability.ai/pricing Accessed 2026-06-28.

[^12]: Winbuzzer. "Stability AI Launches Stable Audio 2.5 with Enterprise-Grade Speed and Creative Control." September 11, 2025. https://winbuzzer.com/2025/09/11/stability-ai-launches-stable-audio-2-5-with-enterprise-grade-speed-and-creative-control-xcxwbn/ Accessed 2026-06-28.

[^13]: fal.ai blog. "Stable Audio 2.5 Now Available on fal." 2025. https://blog.fal.ai/stable-audio-2-5-now-available-on-fal/ Accessed 2026-06-28.