Kling is an AI video generation model developed by Kuaishou Technology, a Chinese short-video platform company publicly traded on the Hong Kong Stock Exchange. First unveiled on June 10, 2024, Kling uses a diffusion-based transformer architecture combined with a proprietary 3D variational autoencoder (VAE) to generate high-quality videos from text prompts and images. Since its launch, the model has undergone rapid iteration, progressing from version 1.0 through 3.0 in under two years, and has become one of the leading AI video generation platforms globally. By June 2025, Kling AI had surpassed an annualized revenue run rate of over USD 100 million and served more than 60 million creators worldwide.
Kuaishou Technology (stock code: 1024.HK) is a Chinese publicly traded holding company headquartered in Haidian District, Beijing. The company was founded in 2011 by Su Hua and Cheng Yixiao. Kuaishou operates the second-largest short-video platform in the world, competing primarily with ByteDance's Douyin (the Chinese counterpart to TikTok). Outside China, the app is marketed under the names Kwai and Snack Video in various markets.
Kuaishou debuted on the Hong Kong Stock Exchange on February 5, 2021, raising approximately HK$41.3 billion (roughly USD 5.4 billion) in one of the most heavily oversubscribed IPOs in Hong Kong's history. Shares opened at HK$338 and closed the first day 160% above the IPO price of HK$115. At its peak valuation, the company was worth more than USD 160 billion, though its share price later declined sharply alongside a broader regulatory crackdown on Chinese technology companies.
For the full year of 2024, Kuaishou reported total revenue of RMB 126.9 billion, an increase of 11.8% year-over-year. The Kuaishou app averaged 399.4 million daily active users (DAU) and 709.7 million monthly active users (MAU) during 2024. In Q1 2025, DAU grew to 408 million and MAU reached 711.7 million.
Kuaishou has invested heavily in artificial intelligence research, producing several notable models beyond Kling, including KwaiYii (a 175-billion-parameter large language model), Kolors (an open-source text-to-image generation model), and KeTu (a text-to-image generation system). Kolors is built on latent diffusion and trained on billions of text-image pairs, supporting bilingual Chinese and English prompts. The Kolors model weights have been released under the Apache-2.0 license for academic research.
Kling is built on a Diffusion Transformer (DiT) architecture, a class of generative model that applies the diffusion process within a transformer framework rather than the U-Net backbone traditionally used in earlier diffusion models. Kuaishou enhanced this base architecture with several proprietary innovations.
At the core of Kling's encoding and decoding pipeline is a self-developed 3D VAE network. Unlike conventional 2D VAEs that process individual image frames independently, the 3D VAE performs synchronous spatiotemporal compression, encoding both spatial (within-frame) and temporal (across-frame) information simultaneously. This approach yields high reconstruction quality while maintaining an efficient balance between training performance and computational cost. The 3D VAE allows Kling to produce videos with consistent motion across frames, reducing flickering and temporal artifacts that plague simpler frame-by-frame generation approaches.
Kling employs a computationally efficient full-attention mechanism that operates as a spatiotemporal modeling module. This 3D spatiotemporal joint attention mechanism integrates temporal and spatial information in a unified computation, enabling the model to capture local spatial features within individual video frames and temporal dynamic features across frames simultaneously. The result is a comprehensive understanding and reproduction of motion information in videos, including complex scenarios such as fast-moving objects, drastic scene transitions, and intricate human movements.
While Kuaishou has not disclosed the full details of Kling's training dataset or parameter count, the company has stated that the model was trained to simulate the characteristics of the physical world. Kling demonstrates strong capabilities in modeling realistic physics, including gravity, fluid dynamics, cloth simulation, and object interactions. This suggests training on large-scale, diverse video datasets that capture a wide range of real-world motion patterns.
Kling has gone through rapid iteration since its initial release. The following table summarizes the major version releases.
| Version | Release Date | Key Features |
|---|---|---|
| Kling 1.0 | June 10, 2024 | Initial release; text-to-video and image-to-video; up to 2 minutes at 1080p, 30 fps; launched in KuaiYing app (China only) |
| Kling 1.5 | September 19, 2024 | Motion Brush for controlling up to 6 elements; end-frame support; 1080p in Professional mode; camera movement controls |
| Kling 1.6 | December 19, 2024 | 195% improvement in image-to-video over 1.5; better prompt adherence; more realistic human motion and facial expressions; Standard and Professional modes |
| Kling 2.0 | April 15, 2025 | Multimodal Visual Language (MVL) framework; multimodal video editing (add/remove/replace elements); videos up to 10 seconds at 1080p; 182% win ratio vs Google Veo 2 in internal testing |
| Kling 2.1 | May 26, 2025 | Standard (720p), Professional (1080p), and Master (1080p) tiers; improved character consistency; keyframing for start/end positions |
| Kling 2.5 Turbo | September 23, 2025 | Faster generation; multi-step prompt parsing; start and end frame control with 20+ presets; nearly 30% lower cost than 2.1; topped Artificial Analysis leaderboard |
| Kling O1 | December 1, 2025 | First unified multimodal video model; combines generation, editing, inpainting, style transfer in one engine; Multimodal Transformer architecture; 3 to 10 second videos |
| Kling 2.6 | December 3, 2025 | Simultaneous audio-visual generation; native dialogue, narration, sound effects, and ambient audio; supports Chinese and English voice generation; videos up to 10 seconds |
| Kling 3.0 | February 5, 2026 | Video 3.0, Video 3.0 Omni, Image 3.0, Image 3.0 Omni; native audio in 5+ languages; multi-shot storyboard; multi-character coreference; up to 15-second videos; 2K/4K image output |
Kling 1.0 was announced on June 10, 2024, and initially made available for public testing within KuaiYing, Kuaishou's video editing app for the Chinese market. The model could generate videos up to two minutes long at 1080p resolution and 30 frames per second, supporting multiple aspect ratios. At the time, these specifications were considered competitive with OpenAI's Sora, which had been announced in February 2024 but was not yet publicly available. Kling 1.0 supported text-to-video generation, transforming natural language descriptions into video clips that demonstrated realistic physics, complex human movements, and creative scene compositions.
Released on September 19, 2024, Kling 1.5 introduced several important creative control features. The Motion Brush tool allowed users to define the movement paths of up to six individual elements within a scene, including characters and objects, while a complementary static brush let users designate areas that should remain motionless. The update also added end-frame support, enabling users to upload two images as start and end frames and have the model generate the transitional content between them. Professional mode now delivered 1080p HD output with improved motion quality and text responsiveness. Users could also generate up to four videos simultaneously.
In late December 2024, an additional update brought the end-frame feature to Kling 1.5 and introduced Kolors 1.5 (an improved image generation model) along with new voices and emotions for the Lip Sync feature.
Kling 1.6 launched on December 19, 2024, and represented a significant leap in image-to-video quality, with Kuaishou reporting a 195% improvement over Kling 1.5 in this category. The model demonstrated stronger prompt understanding, particularly for motion directives, camera angles, and sequential movements. Videos featuring human subjects showed markedly more realistic movements and lifelike facial expressions, with fluid transitions between actions. Visual quality improvements included more dynamic color rendering, more detailed aesthetics, more realistic lighting and shadows, and greater consistency in style and theme throughout a generated video. Kling 1.6 operated in two modes: Standard and Professional.
Announced on April 15, 2025, at the "From Vision to Screen" launch event in Beijing, Kling 2.0 marked a major generational upgrade. The model introduced the Multimodal Visual Language (MVL) interaction framework, which allowed users to combine text with multiple input types, including images, video clips, voice, and motion trajectories. A new multimodal video editing function enabled users to add, remove, or replace content elements in generated videos by providing images or text instructions. The Kolors 2.0 image model, released alongside it, supported over 60 types of stylized effect transcription.
In internal testing, Kuaishou reported that Kling 2.0 achieved a 182% win-loss ratio against Google Veo 2 and a 178% win-loss ratio against Runway Gen-4 in image-to-video evaluations. By this point, Kling AI had grown to over 22 million global users.
Released on May 26, 2025, Kling 2.1 expanded the product lineup into three tiers. Standard mode generated 720p video at 20 credits per 5 seconds. Professional mode produced 1080p output at 35 credits per 5 seconds. The premium Master edition offered superior dynamics and prompt adherence at 1080p, supporting both text-to-video and image-to-video generation with keyframing capabilities that let users specify start and end positions for subjects. Kling 2.1 improved character styling consistency and provided better control over action, motion, and camera framing compared to 2.0.
Launched on September 23, 2025, Kling 2.5 Turbo delivered significant improvements across multiple dimensions. The model could parse multi-step instructions and causal relationships rather than just single actions, a major upgrade in prompt understanding. Motion quality was enhanced for high-action scenes such as combat sequences, running with camera tracking, and complex choreography. A new start and end frame control feature shipped with over 20 viral presets for cinematic direction.
The 2.5 Turbo was also notably cheaper to use, with credit costs nearly 30% lower than its 2.1 predecessor (25 credits for a 1080p 5-second video, down from 35). In October 2025, the model reached the number one position on the Artificial Analysis Video Arena leaderboard for both text-to-video and image-to-video categories. In blind professional evaluations, Kuaishou reported that the 2.5 Turbo achieved a 285% win ratio against Seedance 1.0 mini, 212% against Veo 3 fast, and 160% against Seedance 1.0 in text-to-video comparisons.
Unveiled on December 1, 2025, Kling O1 was positioned as the industry's first unified multimodal video model. Rather than treating video generation and video editing as separate tasks, O1 consolidated reference-based video generation, text-to-video, start and end frame generation, video inpainting (content insertion and removal), video modification and transformation, style re-rendering, and shot extension into a single engine. The model was powered by a Multimodal Transformer with built-in multimodal comprehension and long-context support.
Kling O1 addressed the consistency challenge by introducing what Kuaishou described as "director-like memory," allowing the model to independently track multiple subjects even in complex group scenes. Users could input natural language editing commands such as "remove passersby," "transition day to dusk," or "swap the protagonist's attire," with O1 executing pixel-level semantic reconstruction. The accompanying O1 image model supported up to 10 reference images to guide new image creations. Video generation lengths ranged from 3 to 10 seconds.
Released on December 3, 2025 (just two days after Kling O1), Kling 2.6 introduced the milestone capability of simultaneous audio-visual generation. Rather than generating silent video first and adding audio in post-production, the model produced visuals, voiceovers, sound effects, and ambient sounds in a single pass. The system supported multiple audio types, including speech, dialogue, narration, singing, rap, ambient sounds, and mixed effects. Voice generation was available in both Chinese and English.
The model achieved tight coordination between voice rhythm, ambient sound, and visual motion through semantic alignment. Characters' body language was designed to match the tone of their speech, not just their physical movements. Audio quality reached what Kuaishou described as professional-grade production standards. Kling 2.6 generated videos up to 10 seconds long and was targeted at advertising, marketing, social media content creation, and e-commerce product showcases.
Kling 3.0 was officially launched on February 5, 2026, representing the most comprehensive update in the model's history. The release included four models: Video 3.0, Video 3.0 Omni, Image 3.0, and Image 3.0 Omni.
Video 3.0 introduced native audio generation supporting English, Chinese, Japanese, Korean, and Spanish, with multiple dialect and accent options. Video duration was extended to up to 15 seconds. The model featured intelligent multi-shot storytelling with dynamic camera angle adjustments and preserved text elements such as signage and logos with high accuracy, a feature particularly useful for advertising applications.
Video 3.0 Omni allowed creators to upload reference videos so the AI could extract and replicate a character's visual traits and voice characteristics across new scenes, enabling "digital twin" performance. A multi-shot storyboard feature let users specify the duration, shot size, perspective, narrative content, and camera movements for each shot. Multi-character coreference ensured that the model could maintain distinct identities for multiple characters throughout a scene.
Image 3.0 and Image 3.0 Omni introduced support for 2K and 4K ultra-high-definition output for professional use cases, with enhanced texture and lighting preservation.
By the time of the 3.0 launch, Kling AI had served over 60 million creators worldwide, with more than 600 million videos produced and over 30,000 enterprise clients.
Kling offers a broad set of features across its various model versions.
The core capability of Kling is transforming natural language text prompts into video clips. Users describe a scene, action, or concept in plain language, and the model generates a corresponding video. Later versions improved the model's ability to handle multi-step instructions, causal relationships, and complex compositions.
Kling can take a static image as input and animate it into a video sequence. This feature has been a particular strength of the platform, with each successive version showing marked improvements in how faithfully the generated video preserves the style, content, and details of the input image.
Introduced in Kling 1.5, the Motion Brush gives users fine-grained control over motion within a scene. Users can paint movement paths for up to six elements independently, while using the static brush to lock other areas in place.
Kling supports audio-driven lip synchronization, allowing users to upload voiceovers or songs and have character lip movements automatically match the audio. The feature works with real, 3D, and 2D human characters and supports clips up to 10 seconds.
Kling AI offers an AI-powered virtual try-on feature, available through both the web platform and API, enabling users to visualize how clothing and accessories would look on different subjects.
Starting with Kling 2.0 and expanded in Kling O1, users can edit generated videos through natural language commands. This includes adding, removing, or replacing objects; changing environmental conditions (such as time of day or weather); modifying character attire; and applying style transformations.
Beginning with Kling 2.6, the platform can generate synchronized audio alongside video, including dialogue, narration, environmental sounds, music, and sound effects. Kling 3.0 expanded this to support five languages.
Kling was initially available only in China through Kuaishou's KuaiYing video editing app when it launched in June 2024. On July 25, 2024, Kuaishou opened global access, making the beta version available worldwide through a dedicated web portal at KlingAI.com. International users could register with just an email address and receive 66 free daily credits for image and video generation.
Kuaishou initially launched subscription plans for mainland China users, followed by international subscription plans. The global web platform supports an English-language interface alongside the Chinese version. Kling AI's API is also available globally, serving over 10,000 corporate clients and developers across industries including content creation, advertising, film, animation, game production, and smart devices.
Kling AI operates on a credit-based pricing system. The platform offers both a free tier and several paid subscription plans.
| Plan | Monthly Price | Credits | Key Benefits |
|---|---|---|---|
| Free | $0 | 66 per day (non-rolling) | 720p resolution; watermarked output |
| Standard | $6.99 | 660 per month | 1080p resolution; watermark-free exports |
| Pro | $25.99 | 3,000 per month | 1080p resolution; priority generation |
| Premier | $64.99 | 8,000 per month | 1080p resolution; advanced features |
| Ultra | ~$180 | 26,000 per month | Full feature access; highest priority |
Annual subscription plans offer discounts of approximately 34% compared to monthly billing. Credits from paid plans roll over and remain valid for two years. Additional credits can be purchased starting at $5 for 330 credits.
Credit consumption varies by model version and output quality. For example, a 1080p, 5-second video costs approximately 25 credits with Kling 2.5 Turbo Pro, 35 credits with Kling 2.1 Pro, and 42 credits with Kling 2.6 Pro (due to the audio generation component). Using Kling O1 costs around 34 credits for the same duration.
For API users, Kuaishou offers pre-paid resource packages valid for 90 days. A single 10-second professional-quality video generated through the API costs roughly USD 1. Third-party platforms such as fal.ai and Atlas Cloud also provide access to Kling models with their own pricing structures.
Kling has consistently ranked among the top AI video generation models on independent benchmarks. The Artificial Analysis Video Arena, which uses an Elo rating system based on blind user comparisons, has been a key barometer for the field.
In October 2025, Kling 2.5 Turbo secured the number one position on both the text-to-video and image-to-video Artificial Analysis leaderboards. As of early 2026, Kling 3.0 1080p (Pro) leads the text-to-video leaderboard (without audio) with an Elo score of 1248. In the image-to-video category (without audio), Kling 2.5 Turbo 1080p holds the second position with an Elo score of 1297.
| Leaderboard Category | Top Model | Elo Score | Kling Ranking |
|---|---|---|---|
| Text-to-Video (no audio) | Kling 3.0 1080p (Pro) | 1248 | 1st |
| Text-to-Video (with audio) | SkyReels V4 | 1132 | Kling 3.0 Pro at 1097 (2nd) |
| Image-to-Video (no audio) | Grok Imagine Video | 1329 | Kling 2.5 Turbo at 1297 (2nd) |
Kling operates in a rapidly evolving market for AI video generation. Its primary competitors include models from both established technology companies and specialized startups.
OpenAI Sora. OpenAI announced Sora in February 2024 and released it publicly in December 2024. Sora is recognized for its strong narrative coherence and realism but has faced criticism for slower availability and higher costs. Kling's early public access gave it a first-mover advantage among publicly available high-quality video generators.
Google Veo. Google's Veo series (Veo 2, Veo 3, and Veo 3.1) competes at the premium end of the market, with particular strengths in cinematic camera work and motion quality. Veo 3 introduced native audio generation, a capability Kling matched with its 2.6 release. Kuaishou reported favorable internal benchmark results against Veo 2 when Kling 2.0 was released.
Runway Gen-3 and Gen-4. Runway has been a pioneer in AI-assisted video editing and generation. Its Gen-3 Alpha and later Gen-4 models offer strong creative tools and style control, particularly appealing to professional video editors and VFX artists. However, Runway initially lacked native audio generation, which became a disadvantage as competitors added this capability.
Pika Labs. Pika focuses on accessible, user-friendly AI video generation and has iterated through multiple versions. While popular among casual creators, Pika's output quality has generally been ranked below Kling's in blind comparison tests.
MiniMax Hailuo. MiniMax's Hailuo video model emerged as a notable competitor, particularly in the Chinese market. Hailuo has earned recognition for strong text-to-video quality at competitive price points, positioning it as a "dark horse" in the field.
Other competitors include Luma Labs (Dream Machine), Stability AI (Stable Video Diffusion), ByteDance (Seedance), and xAI (Grok Imagine Video).
The competitive dynamics have driven down pricing across the industry, with the average cost per minute of AI-generated video dropping by approximately 65% between 2024 and 2025.
Kling AI reached several notable business milestones in its first year:
Kuaishou has described Kling AI as ranking among the top large video generation models globally by both revenue growth and total revenue scale.
| Date | Event |
|---|---|
| 2011 | Kuaishou Technology founded by Su Hua and Cheng Yixiao in Beijing |
| February 5, 2021 | Kuaishou lists on the Hong Kong Stock Exchange (1024.HK) |
| June 10, 2024 | Kling 1.0 unveiled; testing begins in KuaiYing app (China) |
| July 2024 | Kolors open-source image model released; Kling web version launched |
| July 25, 2024 | Global beta testing opens for international users at KlingAI.com |
| September 19, 2024 | Kling 1.5 released with Motion Brush and end-frame features |
| December 19, 2024 | Kling 1.6 released with 195% improvement in image-to-video |
| March 2025 | Annualized revenue run rate surpasses USD 100 million |
| April 15, 2025 | Kling 2.0 launched globally with MVL framework |
| May 26, 2025 | Kling 2.1 released with Standard, Pro, and Master tiers |
| September 23, 2025 | Kling 2.5 Turbo released; reaches #1 on Artificial Analysis leaderboard |
| December 1, 2025 | Kling O1 unveiled as first unified multimodal video model |
| December 3, 2025 | Kling 2.6 released with simultaneous audio-visual generation |
| February 5, 2026 | Kling 3.0 series launched with multi-language audio, multi-shot storyboard, and 4K image output |