# Wan 2.5

> Source: https://aiwiki.ai/wiki/wan_2_5
> Updated: 2026-06-24
> Categories: AI Models, Chinese AI, Computer Vision, Generative AI, Video Generation
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Wan 2.5** is a natively multimodal [AI video generation](/wiki/ai_video_generation) model developed by [Alibaba](/wiki/alibaba) Cloud's Tongyi Lab and previewed at the company's Apsara 2025 conference in Hangzhou on September 24, 2025.[1][37] It is the first model in the [Wan](/wiki/wan) family of visual foundation models to natively generate synchronized audio together with video, producing dialogue, sound effects, and music in a single inference pass, and it extends single-clip duration from 5 seconds to 10 seconds while delivering native 1080p output at 24 frames per second.[9][37] Wan 2.5 succeeds Wan 2.2 (July 2025) and the speech-driven variant Wan2.2-S2V (August 2025).[4]

Unlike its predecessors, Wan 2.5 was not released with open weights at launch. Alibaba moved the model into a closed commercial-preview phase delivered through Alibaba Cloud's Bailian platform (also marketed in English as Model Studio) and the public-facing [Tongyi](/wiki/tongyi) Wanxiang website. A formal commercial launch event for Wan2.5-Preview was held on November 11, 2025,[2] the following major version Wan 2.6 followed on December 16, 2025, with multi-shot storytelling and reference-to-video features,[3] and Wan 2.7 arrived in April 2026 with a planning-style Thinking Mode and first-and-last-frame control.[13] Wan 2.5 sits in the same competitive bracket as OpenAI's [Sora 2](/wiki/sora_2), Google's [Veo 3](/wiki/veo_3), ByteDance's [Seedance](/wiki/seedance), and MiniMax's [Hailuo AI](/wiki/hailuo), and is generally positioned by reviewers as a price-competitive challenger from China rather than a state-of-the-art leader on physics realism.[24]

## Background

Alibaba's video-generation work sits inside Tongyi Lab, the research division behind the Tongyi family of foundation models that also produces the Qwen large language models and the Tongyi Wanxiang image generators. The Wan series began with Wan 2.1 in February 2025, a diffusion-transformer model that Alibaba released under the Apache 2.0 license. Wan 2.2 followed in July 2025, again open-sourced, and added a mixture-of-experts variant alongside a 5-billion-parameter lightweight version that fit on a single consumer GPU.[26] The earlier [Wan 2.1-VACE](/wiki/wan_vace) variant focused on video editing and creation tasks. Between Wan 2.2 and Wan 2.5, Alibaba also shipped two specialized open-weight variants of the Wan 2.2 base: Wan2.2-S2V for speech-driven avatar video in late August 2025,[4] and Wan2.2-Animate for character animation and replacement in mid-September 2025.[5]

By the time Wan 2.5 was announced, the broader Wan series had been downloaded more than 6.9 million times across Hugging Face and ModelScope, according to Alibaba Cloud, and had spawned more than 170,000 derivative models trained by community users.[1] That open footprint mattered for the way Wan 2.5 was received: when Alibaba kept Wan 2.5 weights closed, threads on the official GitHub repository (issue #184 on Wan-Video/Wan2.2, for example) filled up with users asking whether the model would ever ship with downloadable weights, and several community members complained that Alibaba had pivoted toward a more commercial posture for its frontier visual model.[23]

The move was part of a broader pattern. Alibaba had begun reserving its top-tier Qwen and Wan releases for paid API access while still open-sourcing the prior generation. For Wan 2.5, that meant Wan 2.1 and Wan 2.2 stayed downloadable from Hugging Face under permissive licenses, while Wan 2.5 itself was accessible only through Alibaba Cloud's Bailian gateway and through resellers that licensed the model. The same split has continued into Wan 2.6 and Wan 2.7, with Alibaba indicating that the planned Wan 3.0 release (60 billion parameters, 4K output, expected mid-2026) is intended to return the flagship line to open weights under Apache 2.0.[36]

## When was Wan 2.5 released?

Wan2.5-Preview was unveiled at the Apsara 2025 conference, Alibaba Cloud's annual flagship developer event held in Hangzhou.[1] Alibaba Cloud's own roadmap announcement dates the reveal to September 24, 2025, the opening day of the conference,[37] and Alibaba's communications, the alizila.com newsroom, and conference coverage from Fintech News Hong Kong place the announcement window around September 23 to 26, 2025.[7][8] The preview release came with four model variants: Wan2.5-T2V-Preview (text-to-video), Wan2.5-I2V-Preview (image-to-video), Wan2.5-T2I-Preview (text-to-image), and Wan2.5-I2I-Preview (image editing).[10] Together these four variants are pitched as a unified multimodal stack rather than four loosely linked products, since they share a backbone trained jointly across text, image, video, and audio.[9][37]

Following the September preview, Alibaba ran a formal commercial launch event for Wan2.5-Preview on November 11, 2025, advertising the model as ready for enterprise use in advertising, film pre-production, and short-form content.[2] Wider availability through additional platforms (Fal.ai, WaveSpeed AI, Higgsfield, Kie.ai, Pollo AI, and others) followed during October and November 2025.

Wan 2.6 launched on December 16, 2025, less than three months after Wan 2.5.[3] The 2.6 release introduced a reference-to-video model called Wan2.6-R2V, multi-shot storytelling from a single prompt, and clip lengths up to 15 seconds.[12] Because Wan 2.6 ships under the same API-only commercial model as Wan 2.5, Wan 2.5 has effectively functioned as an interim preview product rather than a long-lived flagship.

Wan 2.7 followed in early April 2026, with the text-to-video model going live on April 3 and the full four-variant suite (text-to-video, image-to-video, reference-to-video, instruction-based video editing) announced on April 6.[13] The 2.7 release introduced a Thinking Mode reasoning stage, first-and-last-frame interpolation, video-to-video instruction editing, and an expanded prompt window.[32] As with 2.5 and 2.6, the headline Wan 2.7 endpoints are sold through Alibaba Cloud and third-party hosts rather than published as open weights.[31]

### Wan family release timeline

| Release | Date | Format | License | Highlight |
| --- | --- | --- | --- | --- |
| Wan 2.1 | February 2025 | Open weights | Apache 2.0 | Diffusion-transformer text-to-video and image-to-video |
| [Wan 2.1-VACE](/wiki/wan_vace) | Spring 2025 | Open weights | Apache 2.0 | Video editing and reference-driven generation |
| [Wan 2.1](/wiki/wan_2_1) updates | March to May 2025 | Open weights | Apache 2.0 | Iterative quality and resolution improvements |
| Wan 2.2 (A14B, TI2V-5B) | July 28, 2025 | Open weights | Apache 2.0 | First open-source MoE video model, 5B lightweight variant [26] |
| Wan2.2-S2V | August 26, 2025 | Open weights | Apache 2.0 | Speech-to-video digital human avatar generation [4] |
| Wan2.2-Animate | September 19, 2025 | Open weights | Apache 2.0 | Unified character animation and replacement [5] |
| Wan2.5-Preview | September 24, 2025 | Closed API | Commercial | Native audio, 10-second clips, 1080p, unified multimodal stack [1][37] |
| Wan2.5-Preview commercial launch | November 11, 2025 | Closed API | Commercial | Enterprise launch event in Hangzhou [2] |
| Wan 2.6 | December 16, 2025 | Closed API | Commercial | Reference-to-video, multi-shot stories, 15-second clips [3] |
| Wan 2.7 | April 3 to 6, 2026 | Closed API | Commercial | Thinking Mode, first-and-last-frame, video-to-video editing [13] |
| Wan 3.0 (pre-announced) | Expected mid-2026 | Open weights (pre-announced) | Apache 2.0 (pre-announced) | 60B parameters, 4K resolution, 30-second clips [36] |

## What can Wan 2.5 do?

The headline addition in Wan 2.5 is native audio-visual synchronization. Earlier Wan models could produce silent video, with audio added in a separate pass; Wan 2.5 generates voice, sound effects, and background music in lockstep with the picture.[19] In its own roadmap announcement, Alibaba Cloud described the upgrade by saying the new models "natively support high-fidelity audio generation for the video" through a "natively integrated multi-modal architecture, which is trained jointly on text, audio, and visual data."[37] The model can do lip-synced dialogue in Chinese and English, ambient audio that matches the on-screen environment, and music beds, all from a single text or image prompt. Alibaba's marketing for the model leans on the phrase "native multimodality" to describe this property.[9]

Other capabilities are upgrades rather than fresh additions. Single-clip duration moves to 10 seconds, double the 5-second ceiling on Wan 2.2 and consistent with Sora 2 and Veo 3 short-form output; Alibaba's announcement framed this as "doubling the duration from 5 to 10 seconds" with enhanced visual quality.[37][10] Native resolution is 1080p at 24 frames per second, with 4K output advertised as available in select preview endpoints.[9] Camera control instructions, multilingual text rendering, and complex structural content (charts, architectural diagrams, typographic compositions) are all listed by Alibaba Cloud and by third-party reviewers as improved over the prior generation.[17]

### Wan 2.5 capabilities summary

| Capability | Wan 2.5 specification |
| --- | --- |
| Text-to-video | [Text-to-video](/wiki/text_to_video) at 1080p, up to 10 seconds, 24 fps |
| Image-to-video | [Image-to-video](/wiki/image_to_video) from a single reference image, up to 10 seconds, 1080p |
| Native audio | Synchronized voice, sound effects, and music generated jointly with video |
| Lip-sync | Phoneme-level mouth movement for Chinese and English dialogue |
| Resolution | 480p, 720p, 1080p standard; 4K in select preview endpoints |
| Frame rate | 24 frames per second |
| Languages (prompt) | Chinese and English, with partial support for other languages |
| Text-to-image | Photorealistic and stylized outputs, with multilingual text rendering |
| Image editing | Instruction-based edits with multi-image reference and high consistency |
| Camera controls | Pan, zoom, focus pulls, and director-style cinematic instructions |
| Access model | Closed-weight API through Alibaba Cloud Bailian and Tongyi Wanxiang |

The audio side of Wan 2.5 has had the most attention from creators because it changes the production workflow. Before Wan 2.5, a typical pipeline involved generating silent video with one model, then routing the result through a text-to-speech engine and a sound-design pass. Wan 2.5 collapses those steps into one inference, which reduces latency and avoids the lip-sync mismatch that comes from gluing separately generated assets together. Reviewers from getimg.ai, Curious Refuge, and Pollo AI singled out audio as the most disruptive feature of the release.[19][18][20]

## Specialized 2.2-series variants

Before the Wan 2.5 multimodal flagship arrived, Alibaba published two specialized open-weight models that extended the Wan 2.2 backbone into audio-driven and character-driven workflows. Both shipped under Apache 2.0 with full code on GitHub and weights on Hugging Face and ModelScope. They remain the most-used open-weight alternatives for users who want self-hosted, locally controllable video generation.

### Wan2.2-S2V (speech-to-video)

Wan2.2-S2V is an open-source speech-to-video model released by Alibaba on August 26, 2025, which Alibaba positions as a digital-human and AI-avatar tool.[4] The model takes a single still portrait and an audio clip (spoken dialogue, singing, or any other vocal recording) and produces a film-quality animated video in which the character in the photo lip-syncs and gestures in time with the audio.[28] It supports portrait, bust, and full-body framings, vertical and horizontal aspect ratios, and multi-character scenes where more than one figure in the input image is animated against a coherent backdrop.[4]

Technically, Wan2.2-S2V combines text-guided global motion control with audio-driven fine-grained local movements. Alibaba's research team built a dedicated large-scale audio-visual dataset for film and television production scenarios and trained the model with multi-resolution sampling so that the same checkpoint can output everything from short-form vertical clips for TikTok and Douyin to 16:9 broadcast format.[4] A central engineering trick in Wan2.2-S2V is compression of arbitrarily long histories of past frames into a compact latent representation, which is what allows the model to remain stable across the kind of long sustained takes that talking-head video typically requires.[28]

At launch, Wan2.2-S2V weights and inference code were available through the official Wan-Video GitHub organization, Hugging Face, and Alibaba Cloud's ModelScope. The release was framed by Alibaba as a continuation of the open-source posture established by Wan 2.1 and Wan 2.2, and it has been adopted as the de facto open implementation of audio-driven video for users who do not want to depend on closed APIs such as HeyGen or D-ID for digital-avatar workflows.

### Wan2.2-Animate

Wan2.2-Animate is a unified character-animation and character-replacement model released on September 19, 2025, just a few days before the Wan 2.5 announcement at Apsara.[5] The model is built on a 14-billion-parameter backbone derived from Wan 2.2 and supports two operating modes from the same checkpoint.[27] In animation mode, Wan2.2-Animate takes a character image and a reference video and animates the character to match the expressions and movements in the reference at 720p and 24 frames per second.[6] In replacement mode, the model takes an existing video and swaps the on-screen character for an animated version of the supplied reference image, while preserving the reference video's lighting, color grading, and camera moves so that the replacement does not look pasted in.[27]

Alibaba published the Wan2.2-Animate weights and inference code under Apache 2.0, similarly available on GitHub, Hugging Face, and ModelScope. The release post on the official Wan blog framed the model as a one-stop solution for high-fidelity character animation;[5] reviewers on 302.AI's deep dive and Atlas Cloud's Wan-2.2-Animate-Mix benchmark both highlighted the precision of facial-expression transfer and the smooth handoff between scenes when the model replaces moving characters. Wan2.2-Animate has become a popular open-weight comparison point for closed alternatives such as Runway Act-One and the character-animation features inside Sora 2 and Hailuo, partly because it can be fine-tuned with LoRA adapters on top of the public weights.

A brief summary of how the two specialized 2.2-series variants compare:

| Variant | Released | Input | Output | Best use case |
| --- | --- | --- | --- | --- |
| Wan2.2-S2V | August 26, 2025 | Portrait image plus audio clip | Talking, singing, or performing avatar video | Digital humans, dubbed avatars, vertical short-form |
| Wan2.2-Animate (animation mode) | September 19, 2025 | Character image plus reference video | 720p 24 fps animated character | Animation of static characters using a motion reference |
| Wan2.2-Animate (replacement mode) | September 19, 2025 | Reference video plus character image | Reference video with original character swapped out | Face and body replacement with preserved lighting and scene |

Both variants pre-date Wan 2.5 by a few weeks and are still actively maintained on the Wan-Video GitHub organization. They are the practical reason that, even after Alibaba pivoted to a closed-API distribution model for the 2.5 through 2.7 flagships, the open-weight Wan ecosystem has continued to grow rather than stall.

## How does Wan 2.5 differ from Wan 2.6 and Wan 2.7?

Wan 2.5, Wan 2.6, and Wan 2.7 share an architecture family and a commercial-API distribution model, but each release targets different production needs. Wan 2.5 introduced native audio-visual generation and 10-second clips for the first time.[9] Wan 2.6 holds onto those features and adds multi-shot narrative generation, longer 15-second outputs, and a reference-to-video pipeline (Wan2.6-R2V) that uses a user-supplied reference video to preserve a specific character's face and voice across newly generated scenes.[3] Wan 2.7 keeps the 15-second clip ceiling and the audio system, then layers on a Thinking Mode reasoning pass, first-and-last-frame interpolation that lets a user pin both endpoints of a clip, an instruction-based video editing endpoint, and an expanded prompt window for longer, more structured descriptions.[32]

| Dimension | Wan 2.5-Preview | Wan 2.6 | Wan 2.7 |
| --- | --- | --- | --- |
| Announcement date | September 24, 2025 (Apsara 2025) | December 16, 2025 | April 3 to 6, 2026 |
| Maximum clip length | 10 seconds | Up to 15 seconds | Up to 15 seconds |
| Audio generation | Native synchronized audio | Enhanced audio with improved lip-sync | Native synchronized audio with reference voice cloning |
| Reference-to-video | Not supported | Supported via Wan2.6-R2V | Supported with multi-reference consistency |
| First-and-last-frame | Not supported | Not supported | Supported (pin both endpoints) |
| Thinking Mode | Not supported | Not supported | Supported (planning stage before generation) |
| Instruction editing | Image editing only (I2I) | Image editing only (I2I) | Full instruction-based video editing |
| Multi-shot storytelling | Single-shot output | Multi-shot scenes from a single prompt | Multi-shot scenes with sharper continuity |
| Image and text models | Wan2.5-T2I-Preview and Wan2.5-I2I-Preview | Comprehensive upgrades to all four existing models | Refresh of text-to-image and image-editing variants |
| Distribution | Closed API on [Alibaba Cloud](/wiki/alibaba_cloud) Bailian | Closed API on Alibaba Cloud Model Studio and Wan website | Closed API on Alibaba Cloud, WaveSpeed, Fal.ai, OpenRouter, Together AI |
| Positioning | First multimodal flagship in the Wan family | Commercial follow-up emphasising consistency over multiple shots | First Wan release with an explicit reasoning stage |

For practical use, Wan 2.6 fixes most of the rough edges that creators reported in Wan 2.5. The video-to-video editing platform Videoweb.ai compared the two models directly and noted that Wan 2.6 reduced face drift in image-to-video pipelines, held visual coherence across the full clip rather than degrading after 5 to 7 seconds, and handled multi-step prompt instructions more faithfully.[14] The trade-off is that Wan 2.6 demands more from users on the prompt side; it expects explicit scene structuring to take advantage of the multi-shot system.

Wan 2.7 shifts the burden in the opposite direction. With Thinking Mode active, the model spends a small amount of additional inference time building an internal compositional plan, deciding how prompt elements should relate in space and time and what the narrative logic of the sequence should be before any pixels are generated.[32] Reviewers at Tellers, InVideo, and Cliprise have described this as the most significant architectural change in the 2.x line, on the grounds that it changes Wan from a single-pass diffusion model into a model that does something closer to chain-of-thought planning.[31][32][30] First-and-last-frame control is the second major Wan 2.7 addition: users supply two reference frames and let the model interpolate the action between them, which is a workflow that ByteDance's Seedance and Kling have offered for several releases.[29] Video-to-video instruction editing closes the gap with [Runway](/wiki/runway) Gen-3's editing surface and brings the Wan family into rough functional parity with the western flagships.

## Is Wan 2.5 open source?

No. Wan 2.5's licensing has been a sore point with the Wan community. Wan 2.1 (February 2025) and Wan 2.2 (July 2025) both shipped under the Apache 2.0 license with full code and weights on GitHub and Hugging Face under the Wan-Video and Wan-AI organizations.[22] Wan 2.5 broke that pattern. As of late 2025 and into 2026, the official Wan 2.5 weights have not been published to Hugging Face, the Wan-Video GitHub organization, or ModelScope, and Alibaba has not publicly committed to an open-source release for the 2.5 generation.[23] Wan 2.6 and Wan 2.7 have continued that closed-API pattern: at launch, neither model shipped with downloadable weights, and Alibaba has framed both as commercial offerings sold through DashScope and partner platforms.[31]

A few third-party repositories labelled as "Wan 2.5" (including community uploads of FP16 inference shards) appeared on Hugging Face during late 2025, but those are not endorsed Alibaba releases. Reviewers and community blogs (including Flowith and AI Central's open-source coverage) describe Wan 2.5 as Alibaba's first true "commercial-only" entry in the Wan family.[36] Wan 2.6 and Wan 2.7 have continued the same pattern, with no Alibaba-published open weights and access available only through paid API surfaces.

For users who want open weights from the Wan family, Wan 2.2 remains the most recent fully open flagship release. That includes the Wan2.2-T2V-A14B mixture-of-experts variant, the Wan2.2-I2V-A14B image-to-video model, and the lightweight Wan2.2-TI2V-5B model that can run on a single 24 GB GPU.[26] The two specialized 2.2-derived releases (Wan2.2-S2V for speech-to-video and Wan2.2-Animate for character animation and replacement) are also fully open under Apache 2.0.[4][5] Wan 2.5, Wan 2.6, and Wan 2.7 all sit behind paid APIs with no Alibaba-blessed offline option for self-hosted inference.

The community's expectation, based on Alibaba's communications around the Apsara 2026 keynote and the Wan 3.0 pre-announcement, is that the flagship line will revert to Apache 2.0 with the Wan 3.0 release expected in mid-2026. Wan 3.0 has been described in pre-launch material as a 60-billion-parameter model with 4K output and 30-second clip durations, and Alibaba has indicated that it intends to ship weights publicly.[36] Whether that pre-announcement holds remains a point of debate inside the community, but it is the strongest signal so far that the Wan 2.5 through 2.7 commercial-only stretch is not a permanent reorientation.

## How does Wan 2.5 work?

Alibaba has shared less about Wan 2.5's internals than it did for Wan 2.1 and Wan 2.2, both of which came with detailed technical reports. From the disclosed material, Wan 2.5 is built on a unified multimodal backbone that processes text, image, video, and audio inside the same network rather than chaining separate models. Alibaba's launch communications describe the design as a "natively integrated multi-modal architecture, which is trained jointly on text, audio, and visual data,"[37] and partner-platform model cards (WaveSpeed, Fal.ai, Higgsfield) echo this with a joint diffusion-transformer trunk that emits aligned audio and video tokens at inference time.[17] The marketing shorthand for the same idea is "native multimodal architecture, deep alignment."[9]

A second disclosed component is reinforcement learning from human feedback (RLHF) applied on top of paired multimodal datasets, which Alibaba credits with the model's improved instruction adherence and reduced visual artifacts.[37] Beyond those points, parameter counts, training compute, and dataset composition for Wan 2.5 have not been published in the level of detail that Alibaba provided for prior open-weight Wan releases. The technical report that historically accompanied a Wan release has not been issued for Wan 2.5 at the time of writing.

The Wan 2.6 and Wan 2.7 model cards on partner platforms describe both as continuations of the same unified multimodal backbone, with Wan 2.6 adding a longer temporal context window for multi-shot scene generation and Wan 2.7 adding an explicit Thinking Mode planning pass implemented as a separate reasoning head before the diffusion-transformer decoder.[13] None of the three closed releases has a published parameter count, but third-party teardowns and Alibaba's own framing suggest that Wan 2.5 through 2.7 sit between the open Wan 2.2 A14B (~27 billion total parameters, ~14 billion active per step) and the pre-announced Wan 3.0 (60 billion parameters).[36]

## How much does Wan 2.5 cost?

Wan 2.5 is sold through Alibaba Cloud's Bailian platform (Model Studio in English) and the Tongyi Wanxiang consumer website. Bailian exposes the four preview endpoints, including wan2.5-t2v-preview, wan2.5-i2v-preview, wan2.5-t2i-preview, and wan2.5-i2i-preview, plus their commercial-launch counterparts.[10] The API uses Alibaba Cloud's DashScope SDK conventions, with API key authentication and async polling for video jobs. Enterprise customers also have access to a managed deployment path via Alibaba Cloud accounts. Wan 2.6 and Wan 2.7 endpoints follow the same conventions, with names such as wan2.6-t2v, wan2.6-r2v, wan2.7-t2v, and wan2.7-vace surfaced on DashScope alongside the 2.5 endpoints.[13]

### Approximate pricing

Pricing varies by reseller, since Wan 2.5 is distributed through multiple partner platforms. The most widely cited official rate is from Alibaba Cloud DashScope. Other figures below are pulled from pricing pages of partner platforms that license Wan 2.5 directly from Alibaba.[16]

| Platform | Wan 2.5 pricing (USD) | Notes |
| --- | --- | --- |
| Alibaba Cloud DashScope | About $0.105 per second of video [16] | Pay-for-success billing; failed generations not charged |
| Kie.ai pay-as-you-go | $0.06 per second at 720p, $0.10 per second at 1080p [16] | Credits convert to per-second pricing |
| Higgsfield AI | 12 credits per 720p 5-second clip; 20 credits per 1080p 5-second clip [17] | Bundled into $9 or $17.40 monthly plans |
| Wan-ai.co | $9.90 for 100 non-expiring credits, $99 for 1,250 credits | Resold consumer-grade access |
| Tongyi Wanxiang web platform | $1.50 for 30 credits up to $100 for 3,900 credits | Pro $5 per month (300 credits), Premium $20 per month (1,200 credits) |
| OpenRouter (Wan 2.7) | About $0.10 per second of 1080p video [34] | Wan 2.7 image-to-video and text-to-video |
| WaveSpeed AI (Wan 2.7) | About $0.10 to $0.15 per second [13] | Wan 2.7 full-suite reseller |

Reviewers in the Wan 2.5 cost guides consistently call the model the "budget king" of high-end video generation, citing pricing that runs about half of Sora 2 and Veo 3 for comparable resolution and duration.[15] As of late 2025, the Akool cost guide put Wan 2.5 at roughly $0.25 to $1.50 per generation versus $2 to $5 for Sora 2 or Veo 3 on the same workloads.[15] Whether that gap holds depends on which reseller a customer goes through; the official Alibaba Cloud rate is closer to parity with Western providers, while resold consumer platforms tend to be cheaper.

No VolcEngine-hosted endpoint for Wan 2.5 has been confirmed by Alibaba. ByteDance's VolcEngine cloud carries ByteDance's own Seedance video model rather than Wan, so claims of VolcEngine pricing for Wan 2.5 should be treated with caution.

## How does Wan 2.5 compare to Sora 2 and Veo 3?

Wan 2.5 competes against the western video flagships (Sora 2 and Veo 3) and against other Chinese providers ([Seedance](/wiki/seedance) from ByteDance, [Hailuo AI](/wiki/hailuo) from MiniMax, and Kling from Kuaishou). Reviewers at goenhance.ai, Curious Refuge, and Management Works Media tend to score these on three axes: physics and world simulation, cinematic camera control, and audio.[25][18][24]

| Model | Provider | Native audio | Max clip length | Resolution | Open weights |
| --- | --- | --- | --- | --- | --- |
| Wan 2.5 | Alibaba | Yes, synchronized | 10 seconds | 1080p, 4K preview | No |
| Wan 2.6 | Alibaba | Yes, improved lip-sync | 15 seconds | 1080p | No |
| Wan 2.7 | Alibaba | Yes, with reference voice cloning | 15 seconds | 1080p | No |
| [Sora 2](/wiki/sora_2) | OpenAI | Yes | 12 seconds (Sora 2 Pro) | 1080p | No |
| [Veo 3](/wiki/veo_3) | Google DeepMind | Yes | 8 seconds | 1080p | No |
| [Seedance](/wiki/seedance) | ByteDance | Limited | 10 seconds | 1080p | Partial (Seedance 1.0 weights) |
| [Hailuo AI](/wiki/hailuo) | MiniMax | Yes (separate pass) | 10 seconds | 1080p | No |

On physics, Sora 2 generally leads. The model is positioned by OpenAI as a "world model" and reviewers consistently flag its handling of gravity, fluid dynamics, and object collisions as ahead of the field. Wan 2.5 is competitive on stationary or stylized scenes but tends to lose coherence in crowd shots or in scenes with multiple moving subjects interacting; the Curious Refuge and Apatero reviews both flag crowd scenes and object collisions as Wan 2.5's weak points.[18][21] Wan 2.7's Thinking Mode is the company's first explicit attempt to close that physics and continuity gap by adding a planning step that should, in theory, reduce the number of physically implausible compositions the diffusion stage has to clean up.[32]

On cinematic control, Veo 3 is widely considered the strongest, partly because Google's training corpus is heavy in film and broadcast material. Wan 2.5 includes director-style camera instructions and benefits from carrying over Wan 2.2's strong pan and tracking-shot support, but reviewers describe it as somewhere behind Veo 3 on lighting nuance and lens emulation.[25] Wan 2.7 picks up first-and-last-frame control, which is a direct response to the same feature in Seedance and Kling.[29]

On audio, Wan 2.5 and Veo 3 are roughly comparable. Both can do native dialogue, sound effects, and music in one inference pass. Sora 2 also does native audio. Hailuo and Seedance handle audio less elegantly, with separate model passes for sound. For multilingual dialogue, Wan 2.5 has an advantage in Chinese, which is the primary training language and the one Alibaba optimizes for.

Pricing is where Wan 2.5 most clearly separates from the competition. At roughly $0.10 per second of 1080p video on Alibaba Cloud DashScope, Wan 2.5 undercuts Sora 2 by about half, which has made it the default choice for high-volume social-media producers and small studios that need lots of iterations rather than a few hero shots.[15]

## How was Wan 2.5 received?

Reaction to Wan 2.5 has been generally positive on capability and pricing, mixed on stability, and frustrated on licensing. The getimg.ai review described Wan 2.5 as "not just another incremental update" and credited the native audio system with a genuinely disruptive workflow change.[19] Curious Refuge's review called it "one of the most capable open or semi-open video generators available," though that framing leans heavily on the assumption that Wan 2.5 would eventually be open-sourced like Wan 2.2, which has not happened.[18]

Multiple reviewers flagged the same set of weaknesses. Wan 2.5's physics handling for water, hands, and small object collisions is rated as "slightly off" by Curious Refuge.[18] Pollo AI's hands-on review noted that complex multi-character scenes can lose visual coherence past about 7 seconds, which is one of the issues that Wan 2.6 explicitly targeted.[20] The Apatero review of Wan 2.6 framed the 2.6 release as a polish pass on 2.5 rather than a leap forward, which implicitly says that Wan 2.5's failure modes were specific and addressable rather than fundamental.[21]

Reception of Wan 2.7 has been more mixed. Cliprise framed the 2.7 suite as "the most complete open-source video production stack in existence,"[30] but several reviewers including Tellers and Apatero pushed back on the open-source framing and noted that the headline Wan 2.7 endpoints are closed-weight.[31] InVideo's hands-on tests of Thinking Mode reported visibly improved scene composition for prompts that contain multiple subjects and temporal sequencing, but also a longer wait time per generation.[32] MindStudio's comparison of Wan 2.7 against Seedance and Kling concluded that Wan 2.7 had closed most of the feature gap with the leading Chinese competitors while remaining the cheapest of the three on a per-second basis.[29]

The open-source dispute has been the most visible point of community pushback. On the Wan-Video GitHub repository, issue #184 ("Is WAN 2.5 going to be open source?") and issue #291 ("Wan 2.5 weights? Will be open-sourced?") have collected dozens of comments from users questioning Alibaba's commercial pivot.[23] Issue #181 on the same repository, titled "Thank you Alibaba for deceiving and using the open source community," captured the most aggressive end of that reaction. The community has continued to download Wan 2.2 weights in large numbers (Wan 2.2 surpassed 1 million Hugging Face downloads within six days of its release, according to Alibaba's own retrospectives) while waiting to see whether the 2.5, 2.6, or 2.7 weights will eventually become available. The Wan 3.0 pre-announcement, with its explicit return to Apache 2.0, has cooled some of the worst friction but not removed it.[36]

Despite the licensing friction, Wan 2.5 has been adopted by partner platforms more quickly than any prior Wan model, in part because its commercial-API distribution model is friendlier to resellers than Apache 2.0 open weights. As of early 2026, Wan 2.5 is available through Fal.ai, WaveSpeed AI, Higgsfield, Kie.ai, Pollo AI, Atlas Cloud, Imagine.art, VideoMaker.me, Vadoo, FluxPro, and Monet Vision, in addition to the official Alibaba surfaces. Wan 2.7 has expanded that footprint further, with OpenRouter, Together AI, Eachlabs, and Somake also listing 2.7 endpoints.[13] That breadth of distribution is one reason Wan 2.5 became the default Chinese competitor cited in Western coverage of the Sora 2 release.[24]

## See also

- [Wan](/wiki/wan)
- [Wan 2.1](/wiki/wan_2_1)
- [Wan 2.1-VACE](/wiki/wan_vace)
- [Alibaba Group](/wiki/alibaba)
- [Alibaba Cloud](/wiki/alibaba_cloud)
- [Tongyi](/wiki/tongyi)
- [AI video generation](/wiki/ai_video_generation)
- [Sora 2](/wiki/sora_2)
- [Veo 3](/wiki/veo_3)
- [Seedance](/wiki/seedance)
- [Hailuo AI](/wiki/hailuo)
- [Runway](/wiki/runway)
- [Text-to-video](/wiki/text_to_video)
- [Image-to-video](/wiki/image_to_video)

## References

1. Alibaba Cloud. "Alibaba Cloud's Apsara Conference 2025: Full Stack AI + Cloud Leads the Way to the Future of AI." alibabacloud.com/blog/602567 (September 24, 2025).
2. Alibaba Cloud. "Commercial Launch of Wan2.5-Preview." alibabacloud.com/en/events/wan2-5-preview-launch (November 11, 2025).
3. Alibaba Cloud. "Alibaba Unveils Wan2.6 Series Enabling Everyone to Star in Videos." alibabacloud.com/blog/alibaba-unveils-wan2-6-series-enabling-everyone-to-star-in-videos_602742 (December 16, 2025).
4. Alibaba Cloud. "Alibaba Introduces Open-Source Model for Digital Human Video Generation." alibabacloud.com/blog/alibaba-introduces-open-source-model-for-digital-human-video-generation_602493 (August 26, 2025).
5. Alibaba Wan. "Wan2.2-Animate: For Character Animation and Replacement." wan.video/blog/wan2.2-animate (September 19, 2025).
6. Alibaba Wan (Twitter / X). "We're officially launching Wan2.2-Animate, a unified model for high-fidelity character animation and replacement." x.com/Alibaba_Wan/status/1968921551392432175 (September 19, 2025).
7. Alizila. "Alibaba Cloud's Apsara Conference 2025: Full Stack AI + Cloud Leads the Way to the Future of AI." alizila.com (September 24, 2025).
8. Fintech News Hong Kong. "Alibaba Cloud Unveils New AI Models and Upgrades at Apsara 2025." fintechnews.hk/35698/fintechchina/alibaba-cloud-apsara-2025/.
9. Blue Lightning TV. "Alibaba Unveils Wan 2.5-Preview: Native Multimodality for Synchronized Text, Image, Video, and Audio." bluelightningtv.com/2025/09/26/.
10. wan2.video. "Wan 2.5 Preview: The Era of Multisensory Storytelling." wan2.video/wan2.5-preview.
11. Viddo. "Wan2.5: Ali Tongyi's New Multimodal Generative Model Officially Launched." viddo.ai/blog/wan2-5 (September 25, 2025).
12. WaveSpeed AI. "Unlocking Next-Gen Video Creation with Alibaba WAN 2.6 on WaveSpeedAI." wavespeed.ai/blogs (December 16, 2025).
13. WaveSpeed AI. "WAN 2.7: New Features, API Access & Upgrade Path." wavespeed.ai/blog/posts/wan-2-7-features-api-upgrade.
14. Videoweb.ai. "Wan 2.6 vs Wan 2.5: What's Really Improved in the New Release." videoweb.ai/blog/detail (December 2025).
15. Akool. "Wan 2.5 Cost Guide." akool.com/blog-posts/wan-2-5-cost-guide.
16. Evolink AI. "Wan API Pricing Guide: Wan 2.5, Wan 2.6 and Wan Image (2026)." evolink.ai/blog/wan-api-pricing-guide.
17. Higgsfield AI. "Introducing WAN 2.5 Review." higgsfield.ai/blog/Introducing-WAN-2.5-Review.
18. Curious Refuge. "Wan 2.5: An Honest AI Video Generator Review." curiousrefuge.com/blog/wan-25-ai-video-generator-review.
19. getimg.ai. "Wan 2.5 Review: Affordable AI Video with Native Audio Is Here." getimg.ai/blog/wan-2-5-video-generation-ai-model-review.
20. Pollo AI. "I Tested Wan 2.5 AI Video Model: It's Better Than Expected." pollo.ai/hub/wan-ai-2-5-review.
21. Apatero. "Wan 2.6 First Day Review: Better Than 2.5 But Not Revolutionary 2025." apatero.com/blog/wan-2-6-first-day-thoughts-impressions-review-2025.
22. Wan-Video GitHub. "Wan2.2: Open and Advanced Large-Scale Video Generative Models." github.com/Wan-Video/Wan2.2.
23. Wan-Video GitHub. Issue #184 "Is WAN 2.5 going to be open source?" and Issue #291 "Wan 2.5 weights? Will be open-sourced?" github.com/Wan-Video/Wan2.2.
24. Management Works Media. "From Friction to Film: Sora 2, Veo 3, and Wan 2.5." managementworksmedia.com.
25. goenhance.ai. "AI Video Showdown: Sora 2, Veo 3.1 and Wan 2.5." goenhance.ai/blog/ai-video-showdown-sora2-veo3-1-wan2-5.
26. ComfyUI Wiki. "WAN2.2 Open Source Version Released and ComfyUI Native Support in Day 0." comfyui-wiki.com/en/news/2025-07-28-wan2-2-open-source-release (July 28, 2025).
27. ComfyUI Wiki. "Alibaba Releases Wan-Animate Model: Unified Character Animation and Replacement Technology." comfyui-wiki.com/en/news/2025-09-19-wan22-animate (September 19, 2025).
28. ITBrief Asia. "Alibaba unveils Wan2.2-S2V, open-source speech-to-video tool." itbrief.asia/story/alibaba-unveils-wan2-2-s2v-open-source-speech-to-video-tool.
29. MindStudio. "What Is the Wan 2.7 AI Video Model? Features, Release Timeline, and Comparison to Seedance." mindstudio.ai/blog/wan-2-7-ai-video-model-features-release-timeline.
30. Cliprise. "Wan 2.7 Video Suite Is Here: Alibaba Just Shipped the Most Complete Open-Source Video Production Stack in Existence." cliprise.app/news/wan-2-7-video-release.
31. Tellers. "Wan 2.7 Has a Thinking Mode and Closed Weights." tellers.ai/blog/wan_2_7_thinking_mode_ai_video_generation_2026-04-15.mdx (April 15, 2026).
32. InVideo. "Wan 2.7 Complete Guide: Hype Or Best AI Video Model of 2026?" invideo.io/blog/wan-2-7-complete-guide.
33. ComputerTech. "WAN 2.7 Review 2026: Alibaba's Open-Source AI Video Model That Runs Free on Your Machine." computertech.co/wan-2-7-review.
34. OpenRouter. "Wan 2.7 API Pricing & Providers." openrouter.ai/alibaba/wan-2.7.
35. Flaq.ai. "Wan 2.7 API Insight 2026: Open-Source, or Platform-First?" flaq.ai/blog/detail/Is-Wan-2-7-Open-Source-API-Only-or-Platform-First-What-to-Expect-Next.
36. Flowith. "Why Wan's Open-Weight Architecture Will Democratize AI Film Production." flowith.io/blog/wan-2-6-3-0-open-weight-architecture-democratize-ai-film-production.
37. Alibaba Cloud. "Alibaba Cloud Unveils Strategic Roadmaps for the Next Generation AI Innovations." alibabacloud.com/blog/602560 (September 24, 2025).

