Wan 2.5
Last reviewed
May 16, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 · 3,317 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 · 3,317 words
Add missing citations, update stale details, or suggest a clearer explanation.
Wan 2.5 is a multimodal generative video model developed by Alibaba Cloud's Tongyi Lab and previewed at the company's Apsara 2025 conference in Hangzhou in late September 2025. The model belongs to the Wan family of visual foundation models, succeeding Wan 2.2 (July 2025) and the speech-driven variant Wan 2.2-S2V (August 2025). Wan 2.5 is notable for being the first model in the family to natively generate synchronized audio together with video, for extending single-clip duration from 5 seconds to 10 seconds, and for delivering native 1080p output at 24 frames per second.
Unlike its predecessors, Wan 2.5 was not released with open weights at launch. Alibaba moved the model into a closed commercial-preview phase delivered through Alibaba Cloud's Bailian platform (also marketed in English as Model Studio) and the public-facing Tongyi Wanxiang website. A formal commercial launch event for Wan2.5-Preview was held on November 11, 2025, and the following major version, Wan 2.6, followed on December 16, 2025, with multi-shot storytelling and reference-to-video features. Wan 2.5 sits in the same competitive bracket as OpenAI's Sora 2, Google's Veo 3, ByteDance's Seedance, and MiniMax's Hailuo AI, and is generally positioned by reviewers as a price-competitive challenger from China rather than a state-of-the-art leader on physics realism.
Alibaba's video-generation work sits inside Tongyi Lab, the research division behind the Tongyi family of foundation models that also produces the Qwen large language models and the Tongyi Wanxiang image generators. The Wan series began with Wan 2.1 in February 2025, a diffusion-transformer model that Alibaba released under the Apache 2.0 license. Wan 2.2 followed in July 2025, again open-sourced, and added a mixture-of-experts variant alongside a 5-billion-parameter lightweight version that fit on a single consumer GPU. The earlier Wan 2.1-VACE variant focused on video editing and creation tasks.
By the time Wan 2.5 was announced, the broader Wan series had been downloaded more than 6.9 million times across Hugging Face and ModelScope, according to Alibaba Cloud, and had spawned more than 170,000 derivative models trained by community users. That open footprint mattered for the way Wan 2.5 was received: when Alibaba kept Wan 2.5 weights closed, threads on the official GitHub repository (issue #184 on Wan-Video/Wan2.2, for example) filled up with users asking whether the model would ever ship with downloadable weights, and several community members complained that Alibaba had pivoted toward a more commercial posture for its frontier visual model.
The move was part of a broader pattern. Alibaba had begun reserving its top-tier Qwen and Wan releases for paid API access while still open-sourcing the prior generation. For Wan 2.5, that meant Wan 2.1 and Wan 2.2 stayed downloadable from Hugging Face under permissive licenses, while Wan 2.5 itself was accessible only through Alibaba Cloud's Bailian gateway and through resellers that licensed the model.
Wan2.5-Preview was unveiled at the Apsara 2025 conference, Alibaba Cloud's annual flagship developer event held in Hangzhou. Alibaba's own communications, the alizila.com newsroom, and conference coverage from Fintech News Hong Kong place the announcement around September 23 to 26, 2025. The preview release came with four model variants: Wan2.5-T2V-Preview (text-to-video), Wan2.5-I2V-Preview (image-to-video), Wan2.5-T2I-Preview (text-to-image), and Wan2.5-I2I-Preview (image editing). Together these four variants are pitched as a unified multimodal stack rather than four loosely linked products, since they share a backbone trained jointly across text, image, video, and audio.
Following the September preview, Alibaba ran a formal commercial launch event for Wan2.5-Preview on November 11, 2025, advertising the model as ready for enterprise use in advertising, film pre-production, and short-form content. Wider availability through additional platforms (Fal.ai, WaveSpeed AI, Higgsfield, Kie.ai, Pollo AI, and others) followed during October and November 2025.
Wan 2.6 launched on December 16, 2025, less than three months after Wan 2.5. The 2.6 release introduced a reference-to-video model called Wan2.6-R2V, multi-shot storytelling from a single prompt, and clip lengths up to 15 seconds. Because Wan 2.6 ships under the same API-only commercial model as Wan 2.5, Wan 2.5 has effectively functioned as an interim preview product rather than a long-lived flagship.
The headline addition in Wan 2.5 is native audio-visual synchronization. Earlier Wan models could produce silent video, with audio added in a separate pass; Wan 2.5 generates voice, sound effects, and background music in lockstep with the picture. The model can do lip-synced dialogue in Chinese and English, ambient audio that matches the on-screen environment, and music beds, all from a single text or image prompt. Alibaba's marketing for the model leans on the phrase "native multimodality" to describe this property.
Other capabilities are upgrades rather than fresh additions. Single-clip duration moves to 10 seconds, double the 5-second ceiling on Wan 2.2 and consistent with Sora 2 and Veo 3 short-form output. Native resolution is 1080p at 24 frames per second, with 4K output advertised as available in select preview endpoints. Camera control instructions, multilingual text rendering, and complex structural content (charts, architectural diagrams, typographic compositions) are all listed by Alibaba Cloud and by third-party reviewers as improved over the prior generation.
| Capability | Wan 2.5 specification |
|---|---|
| Text-to-video | Text-to-video at 1080p, up to 10 seconds, 24 fps |
| Image-to-video | Image-to-video from a single reference image, up to 10 seconds, 1080p |
| Native audio | Synchronized voice, sound effects, and music generated jointly with video |
| Lip-sync | Phoneme-level mouth movement for Chinese and English dialogue |
| Resolution | 480p, 720p, 1080p standard; 4K in select preview endpoints |
| Frame rate | 24 frames per second |
| Languages (prompt) | Chinese and English, with partial support for other languages |
| Text-to-image | Photorealistic and stylized outputs, with multilingual text rendering |
| Image editing | Instruction-based edits with multi-image reference and high consistency |
| Camera controls | Pan, zoom, focus pulls, and director-style cinematic instructions |
| Access model | Closed-weight API through Alibaba Cloud Bailian and Tongyi Wanxiang |
The audio side of Wan 2.5 has had the most attention from creators because it changes the production workflow. Before Wan 2.5, a typical pipeline involved generating silent video with one model, then routing the result through a text-to-speech engine and a sound-design pass. Wan 2.5 collapses those steps into one inference, which reduces latency and avoids the lip-sync mismatch that comes from gluing separately generated assets together. Reviewers from getimg.ai, Curious Refuge, and Pollo AI singled out audio as the most disruptive feature of the release.
Although Wan 2.5 and Wan 2.6 share an architecture family and a commercial-API distribution model, the two releases target different production needs. Wan 2.5 introduced native audio-visual generation and 10-second clips for the first time. Wan 2.6 holds onto those features and adds multi-shot narrative generation, longer 15-second outputs, and a reference-to-video pipeline (Wan2.6-R2V) that uses a user-supplied reference video to preserve a specific character's face and voice across newly generated scenes.
| Dimension | Wan 2.5-Preview | Wan 2.6 |
|---|---|---|
| Announcement date | September 23 to 26, 2025 (Apsara 2025) | December 16, 2025 |
| Maximum clip length | 10 seconds | Up to 15 seconds |
| Audio generation | Native synchronized audio | Enhanced audio with improved lip-sync |
| Reference-to-video | Not supported | Supported via Wan2.6-R2V |
| Multi-shot storytelling | Single-shot output | Multi-shot scenes from a single prompt |
| Image and text models | Wan2.5-T2I-Preview and Wan2.5-I2I-Preview | Comprehensive upgrades to all four existing models |
| Distribution | Closed API on Alibaba Cloud Bailian | Closed API on Alibaba Cloud Model Studio and Wan website |
| Positioning | First multimodal flagship in the Wan family | Commercial follow-up emphasising consistency over multiple shots |
For practical use, Wan 2.6 fixes most of the rough edges that creators reported in Wan 2.5. The video-to-video editing platform Videoweb.ai compared the two models directly and noted that Wan 2.6 reduced face drift in image-to-video pipelines, held visual coherence across the full clip rather than degrading after 5 to 7 seconds, and handled multi-step prompt instructions more faithfully. The trade-off is that Wan 2.6 demands more from users on the prompt side; it expects explicit scene structuring to take advantage of the multi-shot system.
Wan 2.5's licensing has been a sore point with the Wan community. Wan 2.1 (February 2025) and Wan 2.2 (July 2025) both shipped under the Apache 2.0 license with full code and weights on GitHub and Hugging Face under the Wan-Video and Wan-AI organizations. Wan 2.5 broke that pattern. As of late 2025 and into 2026, the official Wan 2.5 weights have not been published to Hugging Face, the Wan-Video GitHub organization, or ModelScope, and Alibaba has not publicly committed to an open-source release for the 2.5 generation.
A few third-party repositories labelled as "Wan 2.5" (including community uploads of FP16 inference shards) appeared on Hugging Face during late 2025, but those are not endorsed Alibaba releases. Reviewers and community blogs (including Flowith and AI Central's open-source coverage) describe Wan 2.5 as Alibaba's first true "commercial-only" entry in the Wan family. Wan 2.6 has continued the same pattern, with no Alibaba-published open weights and access available only through paid API surfaces.
For users who want open weights from the Wan family, Wan 2.2 remains the most recent fully open release. That includes the Wan2.2-T2V-A14B mixture-of-experts variant, the Wan2.2-I2V-A14B image-to-video model, and the lightweight Wan2.2-TI2V-5B model that can run on a single 24 GB GPU. Wan 2.5 sits behind a paid API, with no offline option for self-hosted inference.
Alibaba has shared less about Wan 2.5's internals than it did for Wan 2.1 and Wan 2.2, both of which came with detailed technical reports. From the disclosed material, Wan 2.5 is built on a unified multimodal backbone that processes text, image, video, and audio inside the same network rather than chaining separate models. Alibaba's launch communications describe this design as "native multimodal architecture, deep alignment," and the model card discussions on partner platforms (WaveSpeed, Fal.ai, Higgsfield) emphasize a joint diffusion-transformer trunk that emits aligned audio and video tokens at inference time.
A second disclosed component is reinforcement learning from human feedback (RLHF) applied on top of paired multimodal datasets, which Alibaba credits with the model's improved instruction adherence and reduced visual artifacts. Beyond those points, parameter counts, training compute, and dataset composition for Wan 2.5 have not been published in the level of detail that Alibaba provided for prior open-weight Wan releases. The technical report that historically accompanied a Wan release has not been issued for Wan 2.5 at the time of writing.
Wan 2.5 is sold through Alibaba Cloud's Bailian platform (Model Studio in English) and the Tongyi Wanxiang consumer website. Bailian exposes the four preview endpoints, including wan2.5-t2v-preview, wan2.5-i2v-preview, wan2.5-t2i-preview, and wan2.5-i2i-preview, plus their commercial-launch counterparts. The API uses Alibaba Cloud's DashScope SDK conventions, with API key authentication and async polling for video jobs. Enterprise customers also have access to a managed deployment path via Alibaba Cloud accounts.
Pricing varies by reseller, since Wan 2.5 is distributed through multiple partner platforms. The most widely cited official rate is from Alibaba Cloud DashScope. Other figures below are pulled from pricing pages of partner platforms that license Wan 2.5 directly from Alibaba.
| Platform | Wan 2.5 pricing (USD) | Notes |
|---|---|---|
| Alibaba Cloud DashScope | About $0.105 per second of video | Pay-for-success billing; failed generations not charged |
| Kie.ai pay-as-you-go | $0.06 per second at 720p, $0.10 per second at 1080p | Credits convert to per-second pricing |
| Higgsfield AI | 12 credits per 720p 5-second clip; 20 credits per 1080p 5-second clip | Bundled into $9 or $17.40 monthly plans |
| Wan-ai.co | $9.90 for 100 non-expiring credits, $99 for 1,250 credits | Resold consumer-grade access |
| Tongyi Wanxiang web platform | $1.50 for 30 credits up to $100 for 3,900 credits | Pro $5 per month (300 credits), Premium $20 per month (1,200 credits) |
Reviewers in the Wan 2.5 cost guides consistently call the model the "budget king" of high-end video generation, citing pricing that runs about half of Sora 2 and Veo 3 for comparable resolution and duration. As of late 2025, the Akool cost guide put Wan 2.5 at roughly $0.25 to $1.50 per generation versus $2 to $5 for Sora 2 or Veo 3 on the same workloads. Whether that gap holds depends on which reseller a customer goes through; the official Alibaba Cloud rate is closer to parity with Western providers, while resold consumer platforms tend to be cheaper.
No VolcEngine-hosted endpoint for Wan 2.5 has been confirmed by Alibaba. ByteDance's VolcEngine cloud carries ByteDance's own Seedance video model rather than Wan, so claims of VolcEngine pricing for Wan 2.5 should be treated with caution.
Wan 2.5 competes against the western video flagships (Sora 2 and Veo 3) and against other Chinese providers (Seedance from ByteDance, Hailuo AI from MiniMax, and Kling from Kuaishou). Reviewers at goenhance.ai, Curious Refuge, and Management Works Media tend to score these on three axes: physics and world simulation, cinematic camera control, and audio.
| Model | Provider | Native audio | Max clip length | Resolution | Open weights |
|---|---|---|---|---|---|
| Wan 2.5 | Alibaba | Yes, synchronized | 10 seconds | 1080p, 4K preview | No |
| Sora 2 | OpenAI | Yes | 12 seconds (Sora 2 Pro) | 1080p | No |
| Veo 3 | Google DeepMind | Yes | 8 seconds | 1080p | No |
| Seedance | ByteDance | Limited | 10 seconds | 1080p | Partial (Seedance 1.0 weights) |
| Hailuo AI | MiniMax | Yes (separate pass) | 10 seconds | 1080p | No |
On physics, Sora 2 generally leads. The model is positioned by OpenAI as a "world model" and reviewers consistently flag its handling of gravity, fluid dynamics, and object collisions as ahead of the field. Wan 2.5 is competitive on stationary or stylized scenes but tends to lose coherence in crowd shots or in scenes with multiple moving subjects interacting; the Curious Refuge and Apatero reviews both flag crowd scenes and object collisions as Wan 2.5's weak points.
On cinematic control, Veo 3 is widely considered the strongest, partly because Google's training corpus is heavy in film and broadcast material. Wan 2.5 includes director-style camera instructions and benefits from carrying over Wan 2.2's strong pan and tracking-shot support, but reviewers describe it as somewhere behind Veo 3 on lighting nuance and lens emulation.
On audio, Wan 2.5 and Veo 3 are roughly comparable. Both can do native dialogue, sound effects, and music in one inference pass. Sora 2 also does native audio. Hailuo and Seedance handle audio less elegantly, with separate model passes for sound. For multilingual dialogue, Wan 2.5 has an advantage in Chinese, which is the primary training language and the one Alibaba optimizes for.
Pricing is where Wan 2.5 most clearly separates from the competition. At roughly $0.10 per second of 1080p video on Alibaba Cloud DashScope, Wan 2.5 undercuts Sora 2 by about half, which has made it the default choice for high-volume social-media producers and small studios that need lots of iterations rather than a few hero shots.
Reaction to Wan 2.5 has been generally positive on capability and pricing, mixed on stability, and frustrated on licensing. The getimg.ai review described Wan 2.5 as "not just another incremental update" and credited the native audio system with a genuinely disruptive workflow change. Curious Refuge's review called it "one of the most capable open or semi-open video generators available," though that framing leans heavily on the assumption that Wan 2.5 would eventually be open-sourced like Wan 2.2, which has not happened.
Multiple reviewers flagged the same set of weaknesses. Wan 2.5's physics handling for water, hands, and small object collisions is rated as "slightly off" by Curious Refuge. Pollo AI's hands-on review noted that complex multi-character scenes can lose visual coherence past about 7 seconds, which is one of the issues that Wan 2.6 explicitly targeted. The Apatero review of Wan 2.6 framed the 2.6 release as a polish pass on 2.5 rather than a leap forward, which implicitly says that Wan 2.5's failure modes were specific and addressable rather than fundamental.
The open-source dispute has been the most visible point of community pushback. On the Wan-Video GitHub repository, issue #184 ("Is WAN 2.5 going to be open source?") and issue #291 ("Wan 2.5 weights? Will be open-sourced?") have collected dozens of comments from users questioning Alibaba's commercial pivot. Issue #181 on the same repository, titled "Thank you Alibaba for deceiving and using the open source community," captured the most aggressive end of that reaction. The community has continued to download Wan 2.2 weights in large numbers (Wan 2.2 surpassed 1 million Hugging Face downloads within six days of its release, according to Alibaba's own retrospectives) while waiting to see whether the 2.5 or 2.6 weights will eventually become available.
Despite the licensing friction, Wan 2.5 has been adopted by partner platforms more quickly than any prior Wan model, in part because its commercial-API distribution model is friendlier to resellers than Apache 2.0 open weights. As of early 2026, Wan 2.5 is available through Fal.ai, WaveSpeed AI, Higgsfield, Kie.ai, Pollo AI, Atlas Cloud, Imagine.art, VideoMaker.me, Vadoo, FluxPro, and Monet Vision, in addition to the official Alibaba surfaces. That breadth of distribution is one reason Wan 2.5 became the default Chinese competitor cited in Western coverage of the Sora 2 release.