PixVerse
Last reviewed
Jun 3, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 · 1,739 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 · 1,739 words
Add missing citations, update stale details, or suggest a clearer explanation.
PixVerse is an AI video generation product made by the Chinese startup AISphere (Chinese: 爱诗科技, romanized Aishi Technology or Aishi Tech). It lets users create short video clips from text prompts, still images, and existing footage, and is aimed at a broad consumer audience rather than professional editors. The company describes the product as a "Canva for video generation," meaning a tool that makes video creation accessible to people without editing skills.[1][2] PixVerse launched for global users in January 2024 and grew quickly on the back of viral effect templates and fast generation. AISphere also runs a domestic Chinese version branded PaiWo AI (拍我AI). In March 2026 the company raised a Series C of about 300 million US dollars and crossed a one billion dollar valuation, the largest single funding round recorded in Asia's AI video sector at the time.[3][4]
AISphere was founded in April 2023 by Wang Changhu (王长虎), who serves as chief executive. Wang holds a degree from the University of Science and Technology of China and was a senior researcher at Microsoft Research Asia before joining ByteDance, where he led the visual technology team and worked on the company's AI lab.[1][5] His background in computer vision shaped AISphere's focus on building its own video generation models rather than reselling third-party systems. Jaden Xie is named as a co-founder of the company.[2] The company is based in Beijing and in 2026 opened an international office in Singapore to support overseas expansion.[4][6]
Wang has publicly argued that video generation has commercial potential on the order of large language models, a view he restated at the 2025 Beijing Zhiyuan Conference.[7] AISphere positions itself as a consumer product company that also develops foundation models, with PixVerse serving the global market and PaiWo AI serving China.
PixVerse generates short clips, typically up to a handful of seconds per shot, from text prompts (a capability often called text to video) or from an uploaded image (image to video). The underlying models are built on a diffusion plus Transformer (DiT) architecture developed in house, with multimodal feature fusion used to improve how prompts map to motion and scene structure.[8][7]
The product has iterated rapidly. The original web version reached global users in January 2024, and AISphere has shipped numbered model upgrades roughly every few months since.
| Version or release | Date | Notable additions |
|---|---|---|
| Global launch | January 2024 | Public web release, text to video and image to video[5][1] |
| V2 | July 25, 2024 | DiT architecture, single clip up to 8 seconds, multi-clip up to 40 seconds, consistency across segments[8] |
| V3 | October 2024 | Quality upgrade; effect templates including the viral "Venom" transformation[5][2] |
| Mobile app | December 2024 | iOS and Android apps; 10-second clips[9] |
| V4 | February 2025 | Faster generation (high-quality clips in about 5 seconds), added sound effects and voice[10] |
| V4.5 | May 15, 2025 | Cinematic camera controls and a fusion system[9] |
| V5 | August 27, 2025 | Agent creation assistant; ranked first globally for image to video on Artificial Analysis[7][11] |
| V5.5 | December 1 to 2, 2025 | One-click multi-shot storyboarding with native audio (effects, dialogue, music)[12][11] |
| V5.6 | January 2026 | Improved visual stability, motion, and audio-visual alignment[13] |
| R1 | January 14, 2026 | Real-time interactive video generation up to 1080p[14] |
Reported version dates vary slightly between English and Chinese coverage, and some intermediate builds (for example a V3.5 beta in late 2024) were rolled out without a separate model release. The table lists the milestones that are confirmed across more than one source.
In January 2026 AISphere announced PixVerse R1, which it calls a real-time video generation model. R1 produces video up to 1080p that responds to user input as it plays, so a viewer can steer the direction of a clip while it is being generated rather than only editing a finished file. The company built R1 around an autoregressive framework for long sequences and an inference engine that cuts sampling to a few steps, and it released the model with API access for enterprise partners.[14]
PixVerse supports text to video and image to video as its core modes. Around these, the product has added a library of effect templates and editing tools that handle most of the work so users only supply a prompt or a photo. Generation is fast by the standards of the category: V4 and later can return short clips in roughly five seconds, and the system can produce a 1080p clip in about a minute.[7]
Later versions added native audio, so a clip can include sound effects, dialogue, voice, and background music generated together with the picture rather than added afterward. V5.5 introduced one-click multi-shot storyboarding, where the model lays out several connected shots and the matching audio from a single prompt, with camera moves and transitions designed automatically.[12] The V5 update added an "Agent" creation assistant that reads an uploaded image, infers what it shows, and assembles a 5 to 30 second clip without manual prompt engineering.[7] Other editing features include character or scene replacement, remixing of existing clips, and frame-based modification.[12]
PaiWo AI (拍我AI) is the domestic Chinese version of PixVerse, run by AISphere for users in mainland China. It launched on June 6, 2025, with both a web app and mobile apps, after the company decided the English name PixVerse was awkward for Chinese users and rebranded the local product.[15] PaiWo AI shares the same in-house Diffusion plus Transformer technology as PixVerse and tracks the same model versions, including the V5 and V5.5 releases, supporting text to video, image to video, effect templates, first-and-last-frame transitions, cinematic camera control, and multi-character narrative generation.[15][12] Splitting the product into separate global and domestic apps follows a common pattern among Chinese AI video firms, which must meet domestic content and regulatory requirements for the China market.
AISphere has raised money in a rapid series of rounds. Early backing came from Ant Group, which led an A2 round of more than 100 million yuan (about 13.8 million US dollars) reported in April 2024.[16] By December 2024 the company disclosed that A2 through A4 rounds together totaled nearly 300 million yuan (about 43 million US dollars), with investors including Ant Group, the Beijing Artificial Intelligence Industry Investment Fund, CAS Investment, and Lighthouse Capital.[17] A later A5 round accompanied PixVerse passing 15 million monthly active users and the plan to launch domestically.[18]
In September 2025 AISphere raised a Series B of more than 60 million US dollars led by Alibaba, with participation from Antler and the Beijing Artificial Intelligence Industry Investment Fund. It was the largest single round raised by a Chinese AI video company up to that point.[1] A B+ round of 100 million yuan followed in October 2025, backed by Fosun RZ Capital, Tongchuang Weiye, and Shunxi Fund.[6]
The Series C was announced on March 12, 2026, at about 300 million US dollars and lifted the company past a one billion dollar valuation, making it a unicorn. CDH Investments (鼎晖) led the round through several of its funds, with a syndicate of close to 20 institutions across Asia and beyond, including China Ruyi, 37 Interactive Entertainment, Yizhuang state capital, Guotai Junan Innovation Investment, Fosun RZ Capital, UOB Venture Management, OCBC's Lion X fund, 3W Fund, Antler, EnvisionX Capital, and iGlobe Partners. The round brought cumulative funding above 400 million US dollars and was described as the largest single AI video financing recorded in Asia.[3][4][19]
On traction, AISphere reported that PixVerse surpassed 100 million users across about 175 countries, with 16 million monthly active users and annual recurring revenue above 40 million US dollars, after starting commercialization in November 2024.[6][3] The company says its models generated over 2 billion videos by early 2026.[2] Much of the early growth came from viral effect templates: a "Venom" transformation that turned a user's photo into the Marvel symbiote spread widely on TikTok from November 2024 and, by AISphere's account, drew billions of cumulative views.[2][1] In September 2025 PixVerse appeared on Andreessen Horowitz's ranking of the top 50 consumer AI mobile apps, at number 25.[6]
PixVerse operates in a crowded AI video market. Its closest Chinese rivals include Kling from Kuaishou, Hailuo from MiniMax, Vidu from Shengshu Technology, and Jimeng from ByteDance, the company where Wang Changhu previously worked. Internationally it competes with OpenAI's Sora and Google's Veo, among others.[2] AISphere's differentiation rests on consumer accessibility, fast generation, a deep template library, and competitive model rankings: on the third-party Artificial Analysis leaderboard, PixVerse V5 reached first place for image to video in August 2025, and later versions remained near the top of both image-to-video and text-to-video charts.[7][3] One complication for the company is that Alibaba, an early investor, also develops its own video models, so several of its backers compete with it in the same market.[20]