# Stable Audio 2.5

> Source: https://aiwiki.ai/wiki/stable_audio_2_5
> Updated: 2026-05-30
> Categories: AI Models, Generative AI, Music & Audio Generation, Speech & Audio AI
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Stable Audio 2.5** is an enterprise focused text-to-audio generation model released by [Stability AI](/wiki/stability_ai) on September 10, 2025. It is the third numbered iteration of the [Stable Audio](/wiki/stable_audio) family and the first version the company explicitly positions as built for enterprise sound production rather than consumer experimentation. The model generates music tracks of up to three minutes at 44.1 kHz stereo, supports text prompts, audio-to-audio transformation, and audio inpainting, and can be fine-tuned on a customer's own licensed audio library.[^1][^2]

The headline technical change relative to earlier Stable Audio releases is generation speed. Stability AI reports that Stable Audio 2.5 produces a three minute track in under two seconds on an Nvidia H100 GPU, using an eight step diffusion process trained with a post-training method the company calls Adversarial Relativistic-Contrastive (ARC). Earlier Stable Audio models required roughly 50 inference steps to reach comparable quality.[^1][^3]

The launch was accompanied by a partnership with sound branding agency amp, part of Landor Group within the WPP holding company, which made Stable Audio 2.5 available to WPP's enterprise clients through the WPP Open platform.[^1][^2]

## Background

[Stability AI](/wiki/stability_ai) is a London based generative AI company best known for the [Stable Diffusion](/wiki/stable_diffusion) family of image models. The company expanded into audio in September 2023 with the original [Stable Audio](/wiki/stable_audio) text-to-music system, which produced short clips of up to 90 seconds in 44.1 kHz stereo using a latent diffusion architecture.[^4]

The second generation, Stable Audio 2.0, launched on April 3, 2024. It extended generation length to three minutes, switched from a U-Net to a diffusion transformer (DiT) over a highly compressed autoencoder, and introduced audio-to-audio style transfer alongside text-to-audio generation. According to Stability AI, the 2.0 model was trained on a dataset of over 800,000 audio files licensed from the AudioSparx music library, with an artist opt-out provision, and the web app used Audible Magic content recognition technology to screen uploaded audio for copyrighted material. Stability AI made the 2.0 web app free at launch on StableAudio.com and added API access shortly afterward.[^4][^5]

In May 2025, between the 2.0 and 2.5 releases, Stability AI and Arm released Stable Audio Open Small, a 341 million parameter text-to-audio model optimized to run on Arm CPUs and generate short samples on-device. That model was the first public deployment of the ARC post-training method later used in Stable Audio 2.5, and it was released under the permissive Stability AI Community License.[^6]

Across the same period the generative music space became crowded. [Suno](/wiki/suno) and [Udio](/wiki/udio) both gained large consumer followings for full song generation with vocals. [ElevenLabs](/wiki/elevenlabs) entered the field in 2025 with [ElevenLabs Music](/wiki/elevenlabs_music). Stability AI's response was to specialize. Stable Audio 2.5 doubles down on instrumental music, sound design beds, and brand audio rather than chasing the pop song use case the consumer apps target.[^7][^8]

## Capabilities

The model handles three core generation modes and several enterprise features that did not ship with the consumer focused 2.0 release.

| Capability | Description |
| --- | --- |
| Text-to-audio | Generates music from a natural language prompt. Stability AI says 2.5 responds more precisely to mood descriptors like "uplifting" and instrument-level prompts like "lush synthesizers" than 2.0. |
| Audio-to-audio | Accepts a reference clip plus a text prompt and transforms the input into a new track in a target style. Carried over from Stable Audio 2.0. |
| Audio inpainting | New in 2.5. Users upload a track, mark a start point, and the model generates a continuation that musically fits the existing material. Suited to fixing or extending parts of an existing arrangement. |
| Track length | Up to three minutes per generation, output at 44.1 kHz stereo. |
| Musical structure | Tracks include explicit intro, development, and outro sections rather than a single repeating loop. |
| Custom fine-tuning | Enterprise customers can fine-tune the model on their own audio library to produce a bespoke sonic identity. Sold as part of enterprise licensing. |
| Commercial safety | Trained on a fully licensed dataset. Stability AI advertises the output as cleared for commercial use, subject to the license tier. |

The inpainting feature is the most consequential addition for sound designers. Earlier audio diffusion tools tended to regenerate a whole clip whenever a section needed changing, which broke continuity. Inpainting lets a producer keep the parts that already work and only resynthesize what is wrong. In its announcement, Stability AI framed the target use cases as brand sound work spanning everything from commercials and game intros to in-store music and audio cues for products such as credit cards and car stereos.[^1][^3]

## Architecture

Stability AI has disclosed two specific technical details about Stable Audio 2.5. The first is that the model is a latent diffusion system, consistent with the diffusion transformer over a compressed autoencoder used in Stable Audio 2.0. The second is the post-training method used to compress the sampling schedule.[^1][^4]

Adversarial Relativistic-Contrastive post-training, or ARC, is the technique Stability AI credits for the eight step inference budget. The method was introduced in the paper "Fast Text-to-Audio Generation with Adversarial Post-Training" (arXiv 2505.08175), submitted by Stability AI researchers in May 2025. The authors describe ARC as the first adversarial acceleration algorithm for diffusion and flow models that is not based on distillation: it extends a relativistic adversarial training formulation to diffusion and flow post-training and adds a contrastive discriminator objective intended to improve prompt adherence. Because it avoids distillation, the approach does not require a separate pretrained teacher model or classifier-free guidance. ARC was first applied to the open Stable Audio Open Small model, where Stability AI reported generating about 12 seconds of 44.1 kHz stereo audio in roughly 75 milliseconds on an H100 GPU, before the same method was used to accelerate Stable Audio 2.5.[^1][^9]

Reported inference time is under two seconds for a three minute output on an Nvidia H100. The company has not published parameter counts, training set size, or the encoder bottleneck rate for the 2.5 model.[^1][^3]

## Pricing and licensing

Stable Audio 2.5 is available through several channels with different commercial terms.

| Channel | Notes |
| --- | --- |
| StableAudio.com | Web interface aimed at individuals. Free tier for personal use only. Paid Creator tier extends commercial rights to individuals earning under \$1 million per year. |
| Stability AI API | Pay-as-you-go credit pricing on the Stability AI Platform, where 1 credit equals \$0.01. New accounts start with 25 free credits. |
| fal, Replicate, ComfyUI | Third party hosting platforms with their own per-request pricing. |
| Enterprise license | Direct contract with Stability AI required for any organization with annual revenue above \$1 million, for API resellers, and for on-premises deployment. Includes implementation support, custom fine-tuning, and professional services. |

Stability AI markets the underlying training data as fully licensed and the outputs as commercially safe, which is the company's primary selling point against legally contested competitors. Uploaded reference audio still has to be copyright cleared by the user, and the platform runs content recognition checks on uploads to enforce this.[^1][^10][^11]

## Comparison to competitors

Stable Audio 2.5 sits in an awkward part of the market. It is faster than the consumer music generators and is positioned for licensed enterprise work, but it does not generate vocals or full songs in the way the leading consumer systems do.

| Model | Vendor | Released | Vocals | Max length | Commercial license clarity |
| --- | --- | --- | --- | --- | --- |
| Stable Audio 2.5 | [Stability AI](/wiki/stability_ai) | September 2025 | No | 3 minutes | Trained on licensed data, commercial use under paid tiers |
| [Suno v5](/wiki/suno_v5) | Suno | 2025 | Yes | About 8 minutes per generation | Disputed, multiple major label lawsuits pending |
| [Udio](/wiki/udio) | Uncharted Labs | 2024 | Yes | Up to 15 minutes via extensions | Disputed, multiple major label lawsuits pending |
| [ElevenLabs Music](/wiki/elevenlabs_music) | ElevenLabs | 2025 | Yes | Up to 5 minutes | Trained on licensed data, commercial use under paid tiers |

Reviewers writing about the 2025 audio model landscape generally place Stable Audio in the instrumental and sound design category rather than the song writing category. Coverage from outlets like Geeky Gadgets and aicompetence describes Stable Audio as the strongest choice for cinematic beds, sample packs, loops, and sound effects, and Suno or Udio as the better picks for vocal led pop and hip hop.[^7][^8]

## Reception

Reception at launch focused on three things: the speed claim, the enterprise positioning, and the WPP partnership. VentureBeat called the eight step generation pipeline a "breakthrough" that cut audio production time from weeks to minutes for brand teams that previously commissioned bespoke sound design. The Decoder framed the release as Stability AI choosing to compete on speed and licensing safety rather than vocal generation. Winbuzzer emphasized the inpainting workflow as the most useful day-to-day addition for sound designers, since it removes the all-or-nothing regeneration problem of earlier tools.[^2][^3][^12]

Independent coverage of Stable Audio's quality has tended to praise the model on instrumental fidelity at 44.1 kHz stereo while flagging the absence of vocals as a real limitation for anyone trying to use it for full songs. The model has not, as of mid 2026, been widely benchmarked against [Suno v5](/wiki/suno_v5) or [ElevenLabs Music](/wiki/elevenlabs_music) on listener preference tests, partly because the systems target different output categories.[^7][^8]

The WPP partnership is the part of the launch that says the most about the strategy. By going through Landor and amp, Stability AI gets distribution into hundreds of brands that already buy sound design as a service. To justify that focus, Stability AI cited Ipsos research in its announcement, stating that custom audio can make a brand eight times more memorable, that audio influences brand engagement by 86 percent, and that only 6 percent of creative uses a distinct sound identity. That is a smaller market than consumer music apps, but it is one where licensing provenance and on-premises deployment are worth real money.[^1][^2]

## See also

- [Stable Audio](/wiki/stable_audio)
- [Stability AI](/wiki/stability_ai)
- [Stable Diffusion](/wiki/stable_diffusion)
- [Suno v5](/wiki/suno_v5)
- [Udio](/wiki/udio)
- [ElevenLabs Music](/wiki/elevenlabs_music)
- [Generative AI](/wiki/generative_ai)
- [Latent diffusion model](/wiki/latent_diffusion)

## References

[^1]: Stability AI. "Stability AI Introduces Stable Audio 2.5, the First Audio Model Built for Enterprise Sound Production at Scale." September 10, 2025. https://stability.ai/news-updates/stability-ai-introduces-stable-audio-25-the-first-audio-model-built-for-enterprise-sound-production-at-scale Accessed 2026-05-31.

[^2]: Michael Nuñez. "Stability AI's enterprise audio model cuts production time from weeks to minutes with 8-step generation breakthrough." VentureBeat, September 10, 2025. https://venturebeat.com/ai/stability-ais-enterprise-audio-model-cuts-production-time-from-weeks-to Accessed 2026-05-31.

[^3]: The Decoder. "Stability AI releases Stable Audio 2.5 for faster and more complex AI-generated music." September 2025. https://the-decoder.com/stability-ai-releases-stable-audio-2-5-for-faster-and-more-complex-ai-generated-music/ Accessed 2026-05-31.

[^4]: Stability AI. "Introducing Stable Audio 2.0." April 3, 2024. https://stability.ai/news-updates/stable-audio-2-0 Accessed 2026-05-31.

[^5]: Voicebot.ai. "Stability AI Releases Augmented Text-to-Music Engine Stable Audio 2 With Upload and Style Transfer Features." April 4, 2024. https://voicebot.ai/2024/04/04/stability-ai-releases-augmented-text-to-music-engine-stable-audio-2-with-upload-and-style-transfer-features/ Accessed 2026-05-31.

[^6]: Stability AI. "Stability AI and Arm Release Stable Audio Open Small, Enabling Real-World Deployment for On-Device Audio Generation." May 2025. https://stability.ai/news/stability-ai-and-arm-release-stable-audio-open-small-enabling-real-world-deployment-for-on-device-audio-control Accessed 2026-05-31.

[^7]: Geeky Gadgets. "AI music creators compared: Udio vs Suno vs Stable Audio vs Audio Shake." 2025. Accessed 2026-05-31.

[^8]: aicompetence.org. "AI Music Generation: Suno vs. Udio vs. Stable Audio." 2025. Accessed 2026-05-31.

[^9]: Zachary Novack et al. "Fast Text-to-Audio Generation with Adversarial Post-Training." arXiv:2505.08175, submitted May 13, 2025. https://arxiv.org/abs/2505.08175 Accessed 2026-05-31.

[^10]: Stability AI. "Stability AI License." https://stability.ai/license Accessed 2026-05-31.

[^11]: Stability AI. "Stability AI Developer Platform Pricing." https://platform.stability.ai/pricing Accessed 2026-05-31.

[^12]: Winbuzzer. "Stability AI Launches Stable Audio 2.5 with Enterprise-Grade Speed and Creative Control." September 11, 2025. https://winbuzzer.com/2025/09/11/stability-ai-launches-stable-audio-2-5-with-enterprise-grade-speed-and-creative-control-xcxwbn/ Accessed 2026-05-31.

[^13]: fal.ai blog. "Stable Audio 2.5 Now Available on fal." 2025. https://blog.fal.ai/stable-audio-2-5-now-available-on-fal/ Accessed 2026-05-31.
