Kling 2.1 is a video generation model developed by Kuaishou Technology, the Beijing-based internet and short-video company behind China's second-largest short-video platform. Released on May 29, 2025, Kling 2.1 arrived approximately six weeks after Kling 2.0 and marked a significant refinement of Kuaishou's second-generation video AI architecture. The model introduced three distinct quality tiers (Standard, Pro, and Master) and pushed maximum output resolution to 1080p, building on the motion-quality and semantic-responsiveness improvements that Kling 2.0 had established in April 2025.
Kling 2.1 is part of a rapid iteration cycle that saw Kuaishou release more than twenty model updates in the first year after Kling's June 2024 debut. The 2.1 generation sits in the middle of the broader 2.x series, which continued through Kling 2.5 Turbo (September 2025) and Kling 2.6 (October 2025) before Kuaishou launched Kling 3.0 in February 2026. During the period when 2.1 was the current flagship, it competed directly with Google's Veo 3, OpenAI's Sora 2, and Runway Gen-4 in what had become a crowded field of commercial video generation services.
Kuaishou Technology (快手) was founded in 2011 as a GIF-sharing tool and grew into one of China's dominant short-video platforms, competing directly with ByteDance's Douyin (TikTok). The company went public on the Hong Kong Stock Exchange in February 2021. Kuaishou has invested heavily in AI research, and the Kling project grew from its internal AI team's work on generative models.
Kling (可灵, Keling in pinyin) debuted in June 2024, initially as a beta feature inside Kuaishou's KuaiYing video editing app before becoming a standalone platform at klingai.com. The model drew immediate international attention for producing remarkably fluid human motion, particularly in walking and running sequences that had previously been a weak point for competing tools. Early demonstrations of a man jogging on a treadmill with realistic foot and leg mechanics circulated widely on social media and established Kling as a serious competitor to Sora, which OpenAI had previewed but not yet widely deployed.
Kling 1.0 through 1.5 were released between June and late 2024, steadily improving prompt adherence, video length, and output quality. Kling 1.6, released in December 2024, achieved the top position in Artificial Analysis's Image-to-Video benchmark category with an Arena ELO score of 1,000, becoming the first Chinese model to lead a major international video generation leaderboard.
Kuaishou announced Kling 2.0 on April 15, 2025, at a launch event titled "From Vision to Screen" held in Beijing. The 2.0 generation introduced the Multi-modal Visual Language (MVL) framework, which allowed users to feed combinations of reference images, video clips, and text descriptions to communicate complex creative intent. Internal testing at launch showed win-loss ratios of 182% against Google Veo 2 and 178% against Runway Gen-4 in image-to-video quality evaluations. Kling 2.0 also introduced a Master tier alongside the standard tier, with the Master tier featuring a Multimodal Video Editing Function that could add, delete, or replace elements in existing video clips by processing image or text instructions alongside the source footage.
By the time Kling 2.0 launched globally, Kuaishou reported that Kling AI had accumulated more than 22 million users and was serving over 15,000 developers and businesses through its API.
Kling 2.0 was released globally on April 15, 2025, after a period when the most advanced features were available only through Kuaishou's domestic Chinese channels. The global rollout was a deliberate commercial move: Kling 1.6 had demonstrated strong international demand, and the API business was growing quickly.
The 2.0 generation's most discussed improvement was semantic responsiveness, the model's ability to translate abstract or compositionally complex prompts into coherent visual sequences. Earlier versions of Kling often produced plausible-looking motion while quietly ignoring parts of a prompt that required understanding spatial relationships or cause and effect. Kling 2.0 significantly narrowed this gap, and its image-to-video pipeline in particular drew favorable comparisons to Veo 2, which had been Google's strongest offering at the time.
The 2.0 Master tier added the Kolors 2.0 image generation model on the image side, with support for over 60 stylized effect variations and a new stylized transcription function for one-click artistic style switching. The image generation component, while distinct from the video pipeline, shared the same interface and credit system, making Kling 2.0 a fuller creative platform rather than a single-function video tool.
The April 2025 launch also established what would become a consistent pattern: Kuaishou published internal benchmark comparisons showing strong win-loss ratios against competitors but did not release the full methodology or raw evaluation data, making independent verification difficult.
Kling 2.1 launched on May 29, 2025, approximately six weeks after Kling 2.0. Kuaishou positioned it not as a major architectural overhaul but as a targeted quality and efficiency improvement, specifically addressing the texture detail, frame continuity, and generation speed gaps that users had identified in Kling 2.0.
The announcement coincided with Kling AI's first anniversary preparations and was framed around cost-effectiveness: the 2.1 Standard tier delivered quality comparable to the 2.0 Master tier at roughly one-third the credit cost. A benchmark analysis by 302.AI found that Kling 2.1 in high-quality mode "will be significantly better than Kling 2.0 in terms of details" and that it resolved frame-skipping issues present in 2.0 Master while improving overall picture continuity.
Kuaishou also reported at the anniversary milestone in June 2025 that Kling had reached an annualized revenue run rate exceeding $100 million, achieved in March 2025 (the tenth month after launch), with monthly subscription bookings exceeding RMB 100 million in both April and May 2025.
Kling 2.1 uses a diffusion-based transformer architecture (DiT) combined with a proprietary 3D variational autoencoder (VAE) network. The DiT backbone enables synchronous spatiotemporal compression, allowing the model to process both spatial relationships within a frame and temporal relationships across frames in a unified computation. Kuaishou describes this as a "3D spatiotemporal joint attention mechanism" that models complex movements and ensures generated videos conform to physical constraints.
The practical effect of this architecture is that Kling 2.1 handles motion sequences involving consistent physical interactions, such as cloth folding under stress, water flowing around obstacles, or limb articulation during athletic movement, more reliably than models using purely spatial attention or frame-by-frame diffusion approaches. The model uses what Kuaishou terms "3D VAE" compression for video tokens, which reduces the computational cost of processing long video sequences while preserving fine-grained temporal detail.
Kling 2.1 uses a DeepSeek-powered prompt rewriting tool accessible within the klingai.com interface. When users activate it, the tool expands short or ambiguous prompts into longer, more structured descriptions before passing them to the video generation pipeline, which improves output quality for users who are not experienced at writing video generation prompts.
Kling 2.1 Standard outputs at 720p. Kling 2.1 Pro and Kling 2.1 Master both output at up to 1080p at 30 frames per second. Aspect ratios include 16:9 widescreen, 9:16 vertical, and 1:1 square. Videos can be generated at 5 or 10 seconds in duration. Generating a 5-second video in Pro quality (1080p) takes under one minute on the standard queue. Free-tier users may wait considerably longer, up to 120 minutes under high load.
Kling 2.1 does not support audio generation natively. Audio was added in the later Kling 2.6 model. The 2.1 generation therefore outputs silent video clips, and any audio must be added in post-production.
Kling 2.1's physics simulation improved measurably over Kling 2.0. In comparative testing, 2.1 produced more natural hand and foot articulation in human subjects, with the palms and feet moving through realistic arcs during rotation sequences. Version 1.6 and 2.0 had both shown characteristic artifacts in extremity motion: twisted or rigid hand positions, foot placements that did not follow natural gait mechanics, and occasional "floating" subjects detached from simulated ground surfaces.
The 2.1 Master tier introduced more explicit modeling of environmental physics, including wind-driven cloth and hair movement, water splash dynamics, and gravity-consistent object trajectories. These were not generated through physical simulation in the traditional software sense but through learned representations from training data, producing visually convincing results in many cases while still sometimes failing on edge cases that fall outside the training distribution.
Kling 2.1 Master is the premium tier of the 2.1 generation, positioned for professional and enterprise use cases. It delivers 1080p output with what Kuaishou describes as "superior motion performance and enhanced semantic responsiveness."
The Master tier differs from the standard Pro tier in several measurable ways. It handles multi-character scenes more reliably, supporting complex compositions where multiple subjects with distinct identities interact within the same frame. It applies more sophisticated environmental physics, including realistic wind, water, and gravity effects on objects and surfaces. Its prompt adherence is stronger in compositionally complex scenarios, such as scenes requiring specific camera angles, precise lighting conditions, or detailed background elements.
In the 302.AI comparison study, Master-tier performance was described as delivering "highly realistic and cinematic" motion quality suitable for professional promotional videos, music video prototyping, and detailed multi-character storytelling, while the Standard and Pro tiers were better suited to social media content, quick concept storyboarding, and marketing clips where the highest level of cinematic detail is not required.
The Master tier is available both on the klingai.com platform and via API. On the platform, it requires a Premier or Ultra subscription to access. Via API, it is billed per generated video independent of subscription tier.
A benchmark finding that received attention in the creator community was the cost efficiency comparison: Kling 2.1's high-quality (Pro) mode delivered output comparable to 2.0 Master at approximately 33% of the credit cost, meaning the new generation offered a better quality-per-credit ratio across the board even before accounting for the Master tier itself.
Kling 2.1 supports text-to-video generation across all three tiers. Users write a natural-language prompt describing the scene, subject, action, and camera behavior, and the model generates a video matching those specifications. The model supports camera movement instructions within prompts, including terms like "slow dolly forward," "overhead crane shot," "handheld follow," and "static wide shot," allowing creators to specify cinematographic style without technical configuration.
The optional DeepSeek-powered prompt enhancement tool can be activated before submission, which rewrites the prompt to add compositional and physical detail that improves consistency between the user's intent and the generated output. The enhancement step adds a few seconds of latency but produces measurably better prompt adherence for short or ambiguous inputs.
Text-to-video in Kling 2.1 Pro generates 5-second clips in approximately 30 to 60 seconds, which was around 50% faster than the equivalent Kling 1.6 generation time. The speedup reflects architectural optimizations in the 2.1 generation's inference pipeline rather than a reduction in output quality.
Image-to-video is Kling AI's strongest capability and accounts for approximately 85% of its video creation volume, according to Kuaishou's internal data. Kling 2.1 accepts a static reference image and a text prompt describing the desired motion, then animates the image according to those instructions.
The frame-based generation approach anchors the first frame of the output video to the provided reference image, preserving the visual identity of subjects, their clothing, facial features, and environmental elements while applying motion consistent with the text description. This capability is particularly useful for product visualization, character animation from illustrations, and consistent character portrayal across multiple generated clips.
Kling 2.1 also supports a keyframing mode where users can specify both a start frame and an end frame, providing the model with the initial and final visual states and letting it generate the motion in between. This gives creators more deterministic control over subject position, camera framing, and scene composition than prompt-only generation allows.
The Motion Brush tool is available within klingai.com and allows users to draw motion paths on specific regions of the input image. A user can select a subject's arm, draw a horizontal arc, and the model will animate that arm along the drawn trajectory while keeping the rest of the scene consistent. The brush accepts up to 50px width and supports multiple overlapping motion regions with independently configured velocities.
Kling AI includes a separate Lip Sync feature, available as a standalone module distinct from the main 2.1 video generation pipeline. The Lip Sync feature takes an existing video clip (either AI-generated or real footage) and an audio track, then produces a new version of the clip where the character's lip movements synchronize with the provided speech.
The feature supports .mp4 and .mov input files up to 100MB and requires 720p or 1080p resolution with dimensions between 720 and 1920 pixels in each dimension. It works on both AI-generated videos and real-world footage containing human faces.
The Lip Sync model uses phoneme-level analysis of the audio track to drive jaw dynamics, lip rounding, tooth visibility, and the subtle cheek and tongue movements that occur during natural speech. According to Kling's documentation, the system models the full complexity of speech articulation rather than mapping audio amplitude to a simple open-close jaw motion, which produces more naturalistic results for complex phoneme sequences.
The system supports over 20 languages for lip-sync animation, with the strongest performance in Mandarin Chinese, reflecting the language distribution of Kuaishou's primary training corpus. English language lip sync is described as "good but slightly less precise" than Mandarin by independent testers. The Kling 3.0 generation later added native multilingual speech synthesis directly within video generation, reducing dependence on the separate Lip Sync module for common use cases.
On fal.ai, the Lip Sync feature is available as a standalone API endpoint under the model identifier fal-ai/kling-video/lipsync/audio-to-video.
While Kling 2.1 itself supports single image-to-video generation, the broader Kling platform introduced multi-image reference capabilities in the Kling O1 model, which followed the 2.x series. O1 accepts up to seven reference images and merges elements across them into a coherent generated scene, allowing creators to specify character appearance from one image, costume from a second, environment from a third, and so on. This capability was not part of Kling 2.1 proper but represents the direction toward which the 2.1 generation's image-reference framework was heading.
The primary consumer interface for Kling 2.1 is the klingai.com web platform, which offers both a browser-based generation interface and a mobile app for iOS and Android. The global version of the platform is accessible at app.klingai.com. Free account holders receive 66 daily credits with watermarked output at reduced resolution. Paid subscribers get access to higher resolution, faster queue priority, and the Pro and Master generation tiers.
The platform includes a video history browser, the Motion Brush tool, the DeepSeek prompt enhancer, and the standalone Lip Sync module within a unified interface.
Kuaishou provides a commercial API at the klingai.com developer portal. The API supports all three 2.1 tiers (Standard, Pro, Master) for both text-to-video and image-to-video generation. Requests are authenticated via API key and priced per-generation based on tier and duration, independent of subscription status. The API documentation at app.klingai.com/global/dev/document-api covers authentication, endpoint structure, webhook callbacks for asynchronous generation, and a quick-start guide.
The API uses an asynchronous task model: a generation request returns a task ID immediately, and the client polls the API or receives a webhook callback when the video is ready. This is consistent with the generation latency involved, which ranges from under 60 seconds for short Pro-tier clips to several minutes for longer or more computationally intensive requests.
Kling 2.1 is available through several third-party API platforms that aggregate AI model access:
Fal.ai hosts Kling 2.1 Standard, Pro, and Master for both text-to-video and image-to-video, as well as the standalone Lip Sync model. Fal.ai's infrastructure is oriented toward developer use cases and provides faster cold-start times and a GraphQL-based API in addition to REST. Model identifiers follow the pattern fal-ai/kling-video/v2.1/{tier}/{mode}.
Replicate also provides Kling model access, though the hosted versions on Replicate have historically lagged behind the latest model releases by several weeks to months, meaning Replicate users may be running Kling 1.6 or 2.0 when 2.1 is the current version on the official API.
WaveSpeedAI, Pollo AI, and Kie.ai are among the third-party platforms offering Kling 2.1 API access with competitive per-generation pricing, sometimes below Kuaishou's official API rates.
AIML API and other model aggregator services list Kling AI endpoints that expose the same generation capabilities through a unified interface alongside other video generation models.
Kling AI uses a credit-based pricing system. Users purchase or receive monthly subscription credits and spend them per generation. Paid subscription plans include bonus credit amounts relative to the free tier.
| Plan | Monthly price | Annual price | Monthly credits |
|---|---|---|---|
| Free | $0 | $0 | ~2,000 (66/day) |
| Standard | $6.99 | ~$5.50/mo | 660 |
| Pro | $25.99 | ~$20.80/mo | 3,000 |
| Premier | $64.99 | ~$52/mo | 8,000 |
| Ultra | $127.99 | ~$102/mo | 26,000 |
Annual billing reduces the effective monthly cost by approximately 20 to 34% depending on plan. Monthly subscription credits expire at the end of each billing cycle and do not roll over. Separately purchased top-up credit packs remain valid for two years.
| Model tier | Duration | Credits | Approximate USD cost |
|---|---|---|---|
| Kling 2.1 Standard | 5 seconds | 20 | $0.13 |
| Kling 2.1 Standard | 10 seconds | 40 | $0.25 |
| Kling 2.1 Pro | 5 seconds | 35 | $0.23 |
| Kling 2.1 Pro | 10 seconds | 70 | $0.45 |
| Kling 2.1 Master | 5 seconds | ~64 | $0.80 |
The approximate USD costs above are calculated at the Pro plan credit rate. The Master tier is approximately 6.4 times more expensive per second than Standard, reflecting its higher computational requirements. The subsequent Kling 2.5 Turbo generation reduced the Pro-tier cost by roughly 30%, bringing 5-second 1080p generation down to 25 credits.
Free-tier output includes a Kling watermark and cannot be used for commercial purposes. Paid tiers produce clean, commercially licensed output.
During the period when Kling 2.1 was the current flagship (May to September 2025), its main competitors were OpenAI's Sora 2, Google's Veo 3, and Runway Gen-4.
| Feature | Kling 2.1 Master | Sora 2 | Veo 3 |
|---|---|---|---|
| Max resolution | 1080p | 1080p | 1080p |
| Max duration | 10 seconds | ~12 seconds | 4 to 8 seconds |
| Native audio | No | Yes | Yes |
| Lip sync | Separate module | Integrated | Integrated |
| Physics quality | Strong | Very strong | Strong |
| Text-to-video | Yes | Yes | Yes |
| Image-to-video | Yes | Limited | Yes |
| Geographic access | Global | Limited (ChatGPT) | US/limited |
| Approximate 5-sec cost | ~$0.80 (Master) | ~$0.40 | ~$1.60 |
Kling 2.1's comparative advantages were its image-to-video pipeline (which remained the strongest in its generation at the time), its geographic accessibility to users outside the US, and its pricing relative to Veo 3 in particular. Kuaishou's internal win-loss testing with Kling 2.0 showed 182% against Veo 2 and 178% against Runway Gen-4 in the image-to-video category; Kling 2.1 built on this foundation with improved texture and frame continuity.
Sora 2's comparative advantages were audio integration (Kling 2.1 produced silent video), longer maximum duration, and stronger physics accuracy in highly complex motion scenarios. Sora 2 was available only through ChatGPT Plus, which limited its accessibility.
Veo 3 offered native audio with synchronized speech and sound effects, which Kling 2.1 lacked entirely. However, Veo 3 was restricted to US users at launch and generated shorter clips. Its per-second cost was also considerably higher than Kling 2.1 for comparable resolutions.
In the broader benchmark landscape, VidCraft AI's community comparisons and 302.AI's structured testing generally found Kling 2.1 to be the better choice for image-to-video work and for users outside the US, while Sora 2 and Veo 3 led for tasks requiring integrated audio or very-high-fidelity physics in short clips.
By February 2026, when Kling 3.0 launched with native audio in five languages, 4K support, and 15-second duration, the audio gap with Sora 2 and Veo 3 had closed substantially.
Kuaishou announced Kling 3.0 on February 4, 2026, with the model series going live on February 5. The launch followed a rapid succession of intermediate releases: Kling 2.5 Turbo (September 2025), Kling 2.6 with native audio (October 2025), and Kling O1 (a multimodal reasoning-integrated video model).
Kling 3.0 introduced several capabilities that addressed the most significant limitations of the 2.1 generation:
Native audio generation: Video 3.0 and Video 3.0 Omni generate synchronized speech, sound effects, and ambient audio without requiring a separate Lip Sync step. The model supports five languages natively: Chinese, English, Japanese, Korean, and Spanish, with multiple accent options for each language.
Extended duration: Kling 3.0 supports video generation up to 15 seconds, compared to Kling 2.1's 10-second maximum.
Higher resolution: Image 3.0 and Image 3.0 Omni output at up to 4K. The video generation component offers improved sharpness relative to 2.1.
Multi-shot storyboarding: The Video 3.0 Omni model includes a multi-shot storyboard feature allowing users to specify duration, shot size, perspective, narrative content, and camera movements for individual cuts within a single generation request, with up to six cuts per pass.
Character consistency: Users can upload a reference video, and the model extracts visual traits and voice characteristics of a character, maintaining them across new scenes.
Kling 3.0 was initially exclusive to Ultra subscribers at launch. The official press release described it as ushering in "an era where everyone can be a director," reflecting the multi-shot storytelling capability that made professional cinematographic structure accessible without video editing expertise.
Kling 2.1 found adoption across several categories of creative and commercial production work.
Brands used Kling 2.1's image-to-video pipeline to animate product photography, turning still shots of products into short motion clips for social media advertising. The approach reduced production costs relative to traditional video shoots: a creator could take existing product images and generate multiple animation variants in minutes rather than organizing a full video production. Creative agencies working with brands including Coca-Cola and Nike used Kling AI during this period to prototype visual concepts and generate preliminary video assets.
Kuaishou's "Bring Your Vision to Screen" initiative, launched in April 2025 alongside the 2.0 generation and continuing through the 2.1 period, received more than 2,000 creative submissions from 60 countries. Winner videos were displayed on large public screens in Tokyo, Paris, Hong Kong, Shanghai, and Toronto.
Film production teams used Kling 2.1 for storyboard animation and visual pre-production. Directors and cinematographers could take concept sketches or location photographs and generate short animated clips to visualize camera movements, blocking, and lighting before committing to actual shooting schedules. This use case leveraged Kling 2.1's strong prompt adherence for camera movement instructions and its ability to maintain environmental consistency across a sequence of animated frames.
Kling AI's collaboration page on Screen Daily listed usage across production companies working on projects for Amazon Prime and other streaming platforms, with the tool described as enabling "more ambitious visuals than time or budget usually allow" in pre-production.
Game developers used Kling 2.1 to generate reference animations for character motion, concept video for pitch materials, and cinematic cutscene prototypes. The image-to-video pipeline was particularly useful for animating character concept art produced by illustrators, providing early motion reference before committing to full 3D animation production.
The largest volume of Kling 2.1 usage came from individual content creators on platforms such as TikTok, Instagram, and YouTube. Creators used the tool to animate still photographs, create short narrative video clips, and produce stylized video content from artwork or AI-generated images. The Standard and Pro tiers were well-suited to this use case given their cost and adequate quality for compressed social media delivery.
Educators and science communicators used Kling 2.1 to create short illustrative video clips for topics that are difficult to photograph or film, including historical reconstructions, biological processes, and abstract physical phenomena. The physics simulation quality of 2.1 made it more reliable than 1.x models for generating clips showing physical processes with some degree of accuracy.
Kling 2.1 received broadly positive reviews from AI video creators and independent evaluators at the time of its release. The combination of improved texture detail, lower credit costs relative to Kling 2.0 Master, and strong image-to-video performance positioned it well against the competition available in mid-2025.
Creators on Reddit and creator-focused review sites consistently identified Kling as producing the smoothest character animation and most realistic human motion in its generation. The 302.AI benchmark comparing 2.1 to 2.0 and 1.6 concluded that 2.1 high-quality mode delivered comparable quality to 2.0 Master at roughly one-third the cost, calling it "the recommended choice for most users."
Industry coverage from Analytics Vidhya and other technical publications noted that Kling 2.1 excelled at "recreating videos from reference frames" but described its lack of native audio as a significant gap compared to Veo 3, which launched with integrated audio generation.
By June 2025, Kling AI's annualized revenue run rate had exceeded $100 million, making it one of the fastest-growing AI products globally by revenue. The platform served over 10,000 enterprise API clients across advertising, film, animation, and game production sectors. By the time Kling 3.0 launched in February 2026, Kling AI had grown to over 60 million registered users globally and had generated over 600 million videos.
The model's reception among professional filmmakers was captured in Screen Daily coverage, where production companies described it as a tool for augmenting workflows "from creative agencies for brands including Coca-Cola and Nike, to games developers, indie creators and high-end productions for studios and streamers."
Several consistent limitations affected Kling 2.1 across independent reviews and community testing.
The absence of native audio was the most frequently cited limitation. Unlike Veo 3 and later Sora 2, Kling 2.1 produced silent video clips. The separate Lip Sync module partially addressed this for speech-driven content but required an additional generation step and a pre-existing video clip as input.
Complex scenes with multiple simultaneously moving subjects remained difficult. Five or more people interacting in a shared space, or scenes with many independently moving objects, tended to produce artifacts, subject simplification, or motion coherence failures. Independent testing found that between 30% and 40% of complex prompts produced usable output without intervention, requiring multiple generation attempts to get acceptable results.
Physics hallucinations appeared in edge-case scenarios. In the 302.AI tilt-shift city test, a vehicle generated by the model was observed accelerating into a sidewalk and back onto the road, a physically impossible action that the model nonetheless generated with visual confidence. The model allocated computational attention across scene elements, and when the demand exceeded capacity, secondary elements degraded first, sometimes in physically incoherent ways.
Character consistency across multiple generation calls was not a native capability of Kling 2.1. Each generation was independent: using the same reference image as input would produce similar but not identical character appearance across different clips, making it difficult to maintain exact visual continuity across a series of clips intended to form a longer narrative sequence.
Censorship restrictions applied to Kling across all tiers and access methods. Operating under Chinese government regulatory requirements, the Kuaishou content moderation system blocked prompts related to political figures, protests, governmental criticism, and related sensitive topics. Specific terms including references to political leaders, the Tiananmen Square protests, Tibet, and LGBTQ content triggered error responses. These restrictions applied to users on both the domestic Chinese platform and the global klingai.com interface. The Cyberspace Administration of China (CAC) tests AI models developed in China for compliance with content guidelines that require responses to embody "core socialist values."
A security issue unrelated to model quality was reported in May 2025: malware campaigns exploited Kling AI's popularity by creating fraudulent websites and advertisements mimicking the klingai.com interface to distribute infostealer malware. Approximately 22 million potential victims were targeted according to security researchers. Kuaishou itself was not compromised; the attacks exploited brand recognition to direct users to third-party sites.
Upscaling, when applied to Kling 2.1 output, introduced hallucinated texture detail: surfaces that appeared plausible at a glance but did not hold up under close inspection, with fine detail that differed from the original generated content.