Veo 2 is a text-to-video generative AI model developed by Google DeepMind, announced on December 16, 2024 alongside an update to the Imagen 3 image generation model. It is the second major iteration of the Veo family and was positioned by Google as a direct competitor to OpenAI's Sora, which had reached general availability one week earlier. Veo 2 generates short cinematic video clips from text prompts and image conditioning, with claims of output up to 4K resolution and clip durations into the multi-minute range, although consumer surfaces typically deliver eight-second 720p clips with a 24 frame-per-second frame rate.
The model arrived first as a limited preview inside VideoFX in Google Labs, then expanded through a steady rollout: a private preview on Vertex AI in December 2024, integration into YouTube Shorts Dream Screen on February 13, 2025, general availability in Gemini Advanced on April 15, 2025, GA on Vertex AI and the Gemini API in April 2025, and integration into Google Vids in May 2025. Veo 2 was succeeded by Veo 3 on May 20, 2025 at Google I/O, but it remained available across most surfaces as a lower-cost option (sometimes branded "Veo 2 Fast") until Google announced a planned retirement deadline of June 30, 2026.
| Type | Text-to-video, image-to-video |
| Developer | Google DeepMind |
| Released | December 16, 2024 (VideoFX preview) |
| GA on Vertex AI | April 2025 |
| Parent | Veo family |
| Predecessor | Veo 1 (May 2024) |
| Successor | Veo 3 (May 2025) |
| Latest variant | Veo 2 / Veo 2 Fast |
| Input modalities | Text, image (start frame) |
| Output | Video at up to 4K resolution, up to 2 minutes (technical), 8 seconds (consumer surfaces) |
| Frame rate | 24 fps |
| Architecture | Latent diffusion transformer |
| Watermarking | SynthID (invisible) |
| Pricing (Vertex AI) | $0.50 per second of generated video |
| Pricing (Gemini API) | $0.35 per second of generated video |
| Retirement deadline | June 30, 2026 (announced) |
The original Veo model, retrospectively called Veo 1, was unveiled by Sundar Pichai and Demis Hassabis at Google I/O on May 14, 2024. The keynote referenced artificial intelligence dozens of times over the course of about two hours, and Veo was positioned as Google's flagship answer to a category that had been defined for most of 2024 by OpenAI's Sora research preview from February 2024 and by Runway's Gen-2 and Gen-3 models. Google described Veo 1 as capable of generating 1080p clips that exceeded one minute in length, supporting text-to-video, image-to-video, and a basic storyboard mode for chaining prompts into longer sequences. Access in May 2024 was confined to a waitlist for VideoFX inside Google Labs, with select filmmakers receiving early invitations.
Veo 1 generated some attention but did not catch on as a viral consumer product. It was visibly behind the bar that Sora's research demos had set on prompt adherence and on the realism of human motion, and most of the publicly shared clips came from a handful of curated creators rather than from broader user traffic. The waitlist limited the volume of community-generated examples, which in turn limited the social-media presence of the model. Inside Google DeepMind, work on the next iteration began essentially immediately after I/O and continued through the summer and autumn of 2024.
By December 2024, the video generation category had begun to consolidate around a small number of well-funded players. OpenAI moved Sora out of research preview on December 9, 2024 and began enrolling ChatGPT Plus and Pro users with the variant marketed as "Sora Turbo." Sora's ability to produce up to 20-second clips at 1080p, with a clean web interface and integration into the existing ChatGPT subscription, was framed by much of the press as a meaningful advance over previous video tooling.
Google DeepMind responded one week later. On December 16, 2024, a blog post co-authored by Aaron van den Oord (research scientist at Google DeepMind, previously of WaveNet and VQ-VAE) and Elias Roman (senior director of product management at Google Labs) announced Veo 2, an Imagen 3 update, and a new tool called Whisk that combined Imagen 3 with Gemini's visual understanding to remix images. Coverage of the announcement framed the timing explicitly as a response to Sora's launch the prior week, and outlets including Fortune, TechCrunch, The Verge, and TechRadar emphasized the 4K resolution claim and the head-to-head benchmark numbers Google published.
Veo 2's rollout occurred in stages over roughly four months, expanding from an invite-only preview to consumer apps and developer APIs. The pace was unusually fast for a frontier video model, but it was not a single-day general availability launch.
| Date | Event | Surface |
|---|---|---|
| December 16, 2024 | Veo 2 announced; Imagen 3 update; Whisk introduced | Blog announcement |
| December 16, 2024 | Limited preview opens | VideoFX (Google Labs) |
| December 2024 | Private preview begins | Vertex AI |
| February 13, 2025 | Veo 2 added to Dream Screen | YouTube Shorts (US, Canada, Australia, New Zealand) |
| April 9, 2025 | API preview and pricing announced | Gemini API ($0.35/sec) |
| April 15, 2025 | Veo 2 begins rolling out to Gemini Advanced (now Google AI Pro) | Gemini app (web and mobile, global) |
| April 2025 | General availability | Vertex AI |
| May 6, 2025 | Veo 2 video generation in Vids | Google Workspace (selected editions) |
| May 20, 2025 | Veo 3 announced; Veo 2 remains available | Google I/O 2025 |
| June 30, 2026 | Planned retirement deadline announced | Vertex AI, Gemini API |
The Dream Screen integration in YouTube Shorts on February 13, 2025 was the first time Veo 2 reached anything close to a mass consumer audience. Initial availability was limited to four English-speaking countries, but it gave Veo 2 a meaningful presence inside a Google product that already saw over two billion logged-in monthly users.
The April 2025 rollout to Gemini Advanced placed Veo 2 inside the same paid consumer subscription that hosted Gemini chat. The price point was $19.99 per month (Google One AI Premium, later renamed Google AI Pro), and the implementation was deliberately conservative: eight-second clips at 720p in 16:9 landscape, returned as MP4 files. Google did not publish a formal monthly cap, only saying that users would be notified when they were close to the limit.
Veo 2's headline capabilities were framed by Google DeepMind in three categories: visual fidelity (resolution and detail), motion quality (physics, human motion, fluid dynamics), and creative control (camera language and prompt adherence).
Google's December 2024 announcement claimed Veo 2 could produce video at "resolutions up to 4K" with durations "extended to minutes in length." In practice, those numbers represented the technical ceiling of the underlying model rather than the experience that most users encountered. Across consumer surfaces, the delivered output was significantly more constrained:
| Surface | Resolution | Max duration | Format |
|---|---|---|---|
| VideoFX (Dec 2024) | 720p | ~8 seconds | MP4, 16:9 |
| YouTube Shorts Dream Screen | Optimized for mobile vertical | ~6 seconds (background) | Vertical |
| Gemini Advanced (April 2025) | 720p | 8 seconds | MP4, 16:9 |
| Google Vids | 720p | 8 seconds | MP4, 16:9 |
| Whisk Animate | 720p | 8 seconds | MP4 |
| Gemini API | 720p | up to 8 seconds | MP4 |
| Vertex AI (GA, April 2025) | up to 4K | up to ~120 seconds | MP4 |
The 4K and two-minute claims held only for enterprise customers using Vertex AI, and even there community testing in mid-2025 reported that native 4K output sometimes showed soft textures and unresolved fine detail, which suggested that the model was producing video at lower resolutions and upscaling for delivery in some configurations. The framerate was 24 frames per second, the standard for cinematic content, across every surface.
The gap between marketed and delivered capabilities became one of the more frequently raised criticisms of Veo 2 in the months after launch. It was not unique to Google, however, since both OpenAI's Sora Turbo and Runway Gen-3 had similar gaps between their marketed and delivered output. The pattern reflected the cost of inference for high-resolution, long-duration video generation more than any deception.
Veo 2's most consistently praised capability was its ability to interpret cinematic terminology in prompts. The model understood lens descriptions ("18mm wide angle," "85mm portrait lens," "anamorphic"), depth of field specifications ("shallow depth of field," "deep focus"), camera movements ("low-angle tracking shot," "dolly in," "crane up," "orbit"), and lighting setups ("golden hour," "chiaroscuro," "neon-lit"). It also responded reasonably well to genre and style references such as "film noir," "documentary handheld," or "anime." The supported control parameters can be summarized as follows:
| Control category | Examples |
|---|---|
| Lens | 18mm wide angle, 35mm, 85mm portrait, fisheye, anamorphic, macro |
| Depth of field | Shallow, deep focus, rack focus |
| Camera movement | Dolly in/out, tracking shot, pan, tilt, crane up/down, orbit, static lock, Dutch angle, handheld |
| Shot type | Close-up, medium, wide, over-the-shoulder, low-angle, bird's eye, top-down |
| Lighting | Golden hour, blue hour, chiaroscuro, soft box, neon, backlight, silhouette |
| Style | Film noir, documentary, anime, photorealistic, claymation, cyberpunk |
| Conditioning | Text prompt, single image as start frame |
The model also supported image-to-video conditioning, where a still image served as the first frame of the generated clip. This feature was available from launch on Vertex AI and progressively in other surfaces.
The most visible technical advance over Veo 1 was the model's understanding of physical interaction. Demos circulated in December 2024 included shots of liquid pouring into glassware with surface tension behavior, fabric draping over moving figures with plausible inertia, and small particles such as sand or snow responding to wind. Google emphasized this as "improved understanding of real-world physics" rather than a separate physics simulator, which was technically accurate: the behavior was an emergent property of training rather than a hard-coded engine.
Human motion also improved measurably. Veo 1 had struggled with characters walking through complex environments and with the synchronization of body movements during interactions, and Veo 2 produced more natural gait, more coherent expressions, and fewer blatant anatomical errors. Hands and fingers, however, remained the most visible weak spot. Reviewers consistently flagged occasional extra fingers, awkward grip poses, or fingers that appeared to float over surfaces they should have been pressing against (a piano keys example was widely cited). Text rendering on signs, posters, and product packaging also remained unreliable, a limitation common to essentially all video generation models in this period.
Google conducted internal head-to-head comparisons using 1,003 prompts from the MovieGenBench dataset published by Meta earlier in 2024. Human evaluators rated 720p, eight-second clips produced by Veo 2 against output from Sora Turbo, Meta Movie Gen, Kling v1.5, and Hailuo's Minimax model. Veo 2 was reported as receiving the highest preference scores in both "overall preference" and "prompt adherence" categories.
The most-cited number from these benchmarks was the head-to-head against Sora Turbo: in pairwise comparisons, Veo 2 was preferred 58.8% of the time, Sora Turbo 26.7% of the time, with 14.5% of judges expressing no preference. These were Google's own benchmarks using Google's own methodology, with prompts drawn from a Meta-authored dataset, and they were not independently replicated. Outlets such as Fortune and The Decoder reported the results approvingly, with headlines describing Veo 2 as having "trounced" the competition.
Google DeepMind has not published a full technical paper for Veo 2. The architecture is described in DeepMind's blog posts, in the Veo developer documentation, and in a technical report focused on Veo 3 that retroactively confirmed key details about the Veo 2 design. The general architecture is a latent diffusion transformer that uses a video autoencoder to compress raw frames into a lower-dimensional latent representation, tokenizes the latent into spatio-temporal patches, and runs a transformer network with self-attention to denoise the latent representation conditioned on a text prompt or image embedding.
Video generation directly in pixel space is computationally prohibitive at any meaningful resolution and length. Veo 2, like most contemporary video models, addresses this by encoding training video into a latent space using a learned autoencoder, then training the diffusion model to operate within that compressed space. This is the same general approach that Stability AI's Stable Video Diffusion, OpenAI's Sora, and earlier Google research projects such as Imagen Video and Lumiere had used.
The forward training process adds Gaussian noise to clean latent representations across a series of timesteps. The model is trained to predict and remove that noise, conditioned on text embeddings (and, optionally, image embeddings). At inference time, the process runs in reverse: starting from random noise in the latent space, the model iteratively denoises toward a coherent video latent, then decodes that latent back to pixel frames.
The transformer backbone uses self-attention across both spatial dimensions (within a frame) and the temporal dimension (across frames), which gives the model the ability to maintain object identity over time. This is the major architectural difference from purely convolutional video diffusion models that came before, and it is why the prompt-adherence properties of Veo 2 are noticeably better than older diffusion-based video models.
Google reported that one of the key engineering investments for Veo 2 was the use of richer captions during training. Rather than associating short, single-line text descriptions with each training clip, the team used multi-sentence captions that described visual style, subject, action, camera behavior, lighting, and atmosphere. This allowed the model to learn nuanced associations between text descriptors and visual outcomes, which is why Veo 2 responded usefully to specialized cinematic vocabulary. Image conditioning worked through a separate pathway in which a CLIP-style image embedding was injected as part of the conditioning signal alongside the text embedding.
Google DeepMind has not publicly disclosed the full composition of Veo 2's training set. The most informative confirmation came in June 2025, when CNBC reported that Google had confirmed using a subset of YouTube videos to train its Veo and Gemini family models. Google declined to specify the size of the subset, but YouTube hosts on the order of 20 billion videos, so even a small percentage of the catalog would correspond to a training corpus much larger than what most other commercial video models have access to.
The disclosure caused significant pushback among YouTube creators, many of whom said they had not been informed and had no clear way to opt out of training. The standard YouTube terms of service grant Google a license to use uploaded content, but legal observers noted that the licensed uses described in the terms had historically been understood as referring to platform features rather than to training generative AI models that would compete commercially with the creators who supplied the data. Coverage in CNBC, PetaPixel, Slashdot, and Business Standard featured creators expressing concern about commercial competition without consent or compensation.
The controversy was foreshadowed in Marques Brownlee's December 2024 social-media commentary on Veo 2, in which he pointed out that the difference between Veo 2 and competing models could plausibly be characterized as the difference between using a small portion of YouTube data and "owning YouTube and just using all of it." Brownlee did not have specific information about Veo 2's training set when he made the comment, but the framing accurately predicted what later reporting confirmed for Veo 3 and, by inference, for Veo 2.
Every video generated by Veo 2 carries an invisible SynthID watermark embedded directly into the pixel data of every frame. SynthID was developed by Google DeepMind as a steganographic system that modifies the generated content in ways that are imperceptible to human viewers but detectable by automated tools using a corresponding model. The watermark is designed to survive common transformations such as resizing, cropping, JPEG and H.264 compression, and standard format conversions, although it is not designed to withstand sophisticated adversarial attacks.
The technology had been previously deployed for Google's image generation systems and for some text outputs from the Gemini family. Veo 2 represented one of the first large-scale deployments of SynthID for generated video. Google made detection tools available to selected partners and to journalists studying AI-generated media, but as of late 2025 a public-facing SynthID detector for end users was not generally available. This meant that even though the watermark was embedded reliably, ordinary social-media viewers had no practical way to verify that a clip they encountered was Veo 2 output without using Google-provided tooling.
Veo 2 outputs in the YouTube Shorts integration also carried YouTube's visible AI-content disclosure label, separate from the SynthID watermark. The combination of an invisible cryptographic watermark and a visible UI label was Google's response to broader concerns about deepfakes and misinformation in the run-up to a global cycle of elections in 2024 and 2025.
Veo 2 was eventually available across most of Google's consumer, creator, and enterprise products. The matrix of where it ran, what it cost, and what it could produce varied considerably by surface.
| Surface | Audience | Pricing | Output |
|---|---|---|---|
| VideoFX (Google Labs) | Limited preview, waitlist | Free with waitlist access | 720p, 8s |
| YouTube Shorts Dream Screen | Creators in US, CA, AU, NZ at launch (expanded later) | Free | Vertical background or standalone clip |
| Gemini Advanced / Google AI Pro | Consumer subscribers, $19.99/month | Bundled | 720p, 8s, monthly cap |
| Whisk Animate | Google AI Premium subscribers | Bundled in Whisk Animate | 720p, 8s |
| Google Vids | Workspace customers (selected editions) | Bundled in Vids | 720p, 8s, 16:9 |
| Gemini API | Developers via Google AI Studio | $0.35/sec | up to 720p, 8s |
| Vertex AI | Enterprise developers | $0.50/sec | up to 4K, up to 120s |
The split between $0.35 per second on the Gemini API and $0.50 per second on Vertex AI reflected the segmentation that Google has used historically: developer-friendly self-serve pricing on the Gemini API for AI Studio, and enterprise-grade pricing on Vertex AI that came with Google Cloud's compliance, IAM, regional deployment, and SLA guarantees. The 30% price difference was significant enough that some smaller commercial users routed video generation through the Gemini API rather than through Vertex AI.
Veo 2's pricing was published in three places: developer documentation for the Gemini API, the Vertex AI generative AI pricing page, and the consumer subscription pages for Google AI Pro and Google AI Ultra.
| Tier | Platform | Rate | Notes |
|---|---|---|---|
| Vertex AI | Veo 2 (veo-2.0-generate-001) | $0.50 per second of generated video | Charges only on successful generation; enterprise volume discounts negotiable |
| Gemini API | Veo 2 | $0.35 per second of generated video | No free tier; charges only on successful generation |
| Whisk Animate | Bundled | Included with Google AI Premium ($19.99/month) | Uses image-to-video conditioning |
| Google AI Pro consumer subscription | Bundled | $19.99/month | Veo 2 in Gemini app, monthly cap |
| Google AI Ultra consumer subscription | Bundled | $249.99/month | Higher Veo 2 access; later expanded to Veo 3 |
| Vids | Bundled | Included with eligible Workspace editions | No separate per-clip charge |
At $0.50 per second on Vertex AI, an eight-second clip cost $4.00 and a one-minute clip cost $30. Generating an hour of video cost $1,800. These prices were considerably higher than the per-second cost of competing models such as Kling and Hailuo's Minimax, but the comparison was complicated by differences in delivered resolution, generation success rate, and quality. On the Gemini API, the same eight-second clip cost $2.80, which made the Gemini API the more economical surface for developers building applications, especially for short-form content where the 720p output was acceptable.
For consumer subscribers, the bundled approach was much more cost-effective. Google AI Pro at $19.99 per month included an unspecified monthly cap of Veo 2 generations alongside the rest of the Gemini Advanced feature set. Estimates from creators experimenting with the feature suggested the practical cap was on the order of dozens to low hundreds of clips per month, which was sufficient for personal experimentation but not for high-volume commercial work.
Veo 2 received broadly positive critical reception, particularly relative to the bar that Sora Turbo had set the prior week. The press treated the December 16, 2024 announcement as a credible counter to Sora rather than as a vaporware response.
Marques Brownlee, in his December 17, 2024 social-media post, wrote that "Google's new video generation model is called Veo 2, and if these hand-picked examples are real, they look better than anything I've gotten out of SORA." He followed up with a YouTube Short in which he pointed out that Sora often gave away its AI nature through implausible motion and physics, while Veo 2 looked more grounded. Brownlee's caveat about "hand-picked examples" was widely echoed in subsequent coverage: until the model was available to a broader user base, it was difficult to evaluate how representative the demo reels were.
Fortune's coverage went further, headlining its piece "Google DeepMind's new Veo 2 AI video generator trounces OpenAI's Sora with 4K resolution." The Decoder reported on the MovieGenBench results in similar terms. TechRadar, TechCrunch, and SiliconANGLE covered the announcement as a meaningful technical advance. Tom's Guide called Veo 2 "one of the best AI video models I've ever seen."
More measured assessments came from outlets that focused on enterprise use. TechTarget's coverage was titled "Google's Veo 2 is technically advanced, but concerns remain," and emphasized open questions about training data, watermark robustness, and the gap between marketed 4K and the 720p that most users actually saw. The Verge's coverage of the announcement noted the resolution claim and the cinematography vocabulary as the headline features, but reserved judgment on quality until the model was more widely available.
When the Gemini Advanced rollout in April 2025 made Veo 2 accessible to paying subscribers, community reception was generally positive but with strong opinions about the eight-second cap. Reddit and Hacker News discussions raised the cap as the most frustrating constraint, with creators noting that interesting narrative ideas often required at least a few clips chained together. The lack of a longer-form mode in the Gemini app (in contrast to the multi-minute capability available on Vertex AI) was a frequent point of complaint.
By mid-2025, Veo 2 had been used in several visible production contexts beyond casual social-media experimentation.
Envato launched a feature called VideoGen on April 2, 2025, built directly on Veo 2 via Vertex AI, which gave Envato's creator user base access to text-to-video and image-to-video generation as part of their existing workflow. This was one of the first large third-party integrations of Veo 2 outside Google's own products.
Google Vids' Veo 2 integration in May 2025 brought Veo 2 generation into the Google Workspace presentation and explainer-video tool. Workspace customers could generate eight-second 720p clips inline in the Vids editor and insert them into longer videos without leaving the application. The integration was free for eligible Workspace editions, which meant that Veo 2 effectively reached the broader enterprise market via the Vids product without separate billing.
Independent creators and short-form filmmakers used Veo 2 for music videos, advertising concept work, social-media campaign assets, and short fiction. Several music artists released music videos that combined Veo 2 clips with traditional editing in early 2025. The aesthetic of the model, which tended toward photorealistic cinematography with a slight sheen of perfect lighting, became identifiable enough that some creators leaned into it deliberately.
The advertising sector experimented heavily, particularly for concept reels and storyboard equivalents. The eight-second clip length was actually well-matched to short-form social advertising placements (Instagram and TikTok engagement metrics frequently favored sub-ten-second clips), which made Veo 2 a practical tool for quickly producing draft ad creative.
Veo 2's documented limitations clustered around the same set of issues that affected the broader video generation category in late 2024 and early 2025.
The most visible was the eight-second clip length cap on consumer surfaces. The technical model could produce up to two minutes on Vertex AI, but that capability was not exposed to consumer users through any first-party Google product, and stitching multiple clips together without visible discontinuities was difficult in practice. Veo 3.1 later addressed this with explicit scene extension tooling, but Veo 2 did not have that feature.
Hands and fingers continued to produce visible artifacts in a meaningful percentage of generations. Anatomically incorrect grips, extra digits, and fingers that did not properly contact surfaces were common enough that creators routinely generated multiple variants and selected the best. Text rendering on objects in the scene (signs, books, screens) was unreliable, with text often appearing as scrambled or incorrectly spelled glyphs.
Physics, while improved over Veo 1, had clear failure modes. Complex fluid dynamics (especially water mixing, turbulent flows, or interactions between liquids and other objects), cloth and hair simulation in fast motion, and intricate mechanical interactions (gears meshing, articulated machinery) produced inconsistent results. The model occasionally generated objects that floated against gravity, lighting that did not match material properties, or motion blur that contradicted the apparent direction of movement.
Long scenes with multiple characters and dialogue cues did not work reliably because Veo 2 had no audio capability. Lip-sync was therefore impossible, which meant that any clip involving characters speaking required manual audio post-production. This limitation was the central feature gap that Veo 3 addressed in May 2025.
Geographic availability was uneven. Initial Gemini Advanced rollout in April 2025 was global, but the YouTube Shorts Dream Screen integration was confined to four countries at launch. Some content policies restricted generation in certain regions, particularly around image-to-video features that Google viewed as carrying higher misuse risk.
Content policies were enforced strictly. Veo 2 refused to generate content involving minors in any context, real public figures, sexually explicit material, graphic violence, or other categories defined by Google's Generative AI Prohibited Use Policy. Refusals were sometimes triggered by ambiguous prompts that did not actually violate policy, which was a common complaint among professional users.
The table below summarizes the practical differences between the two models, drawing on Google's December 2024 announcement and on subsequent developer documentation.
| Capability | Veo 1 (May 2024) | Veo 2 (December 2024) |
|---|---|---|
| Resolution claim | 1080p | Up to 4K |
| Resolution delivered (consumer) | 720p in VideoFX | 720p in VideoFX, Gemini, Vids |
| Maximum duration (technical) | 60+ seconds | ~120 seconds |
| Maximum duration (consumer) | ~6-8 seconds | 8 seconds |
| Frame rate | 24 fps | 24 fps |
| Physics understanding | Basic; visible artifacts on fluid dynamics | Notably improved fluid, cloth, light |
| Human motion | Limited; gait often unnatural | Improved gait, expressions, body movement |
| Hands and fingers | Frequent severe artifacts | Reduced but still common |
| Camera control | Moderate; limited cinematic vocabulary | Strong; lens, depth of field, movement, lighting all supported |
| Image-to-video | Yes | Yes, with improved adherence to start frame |
| Storyboard/scene chaining | Yes (basic) | Yes (improved) |
| Audio output | None | None |
| Watermarking | SynthID | SynthID |
| Initial availability | VideoFX waitlist | VideoFX waitlist, then YouTube Shorts, Vertex AI, Gemini Advanced, Vids |
| Primary surface for consumers | VideoFX (waitlist) | Gemini Advanced ($19.99/month) |
| Pricing on Vertex AI | Initial preview pricing | $0.50 per second of generated video |
Veo 3 was announced on May 20, 2025 at Google I/O 2025 and represented the next major iteration after Veo 2. The defining new capability was native audio generation: Veo 3 produced synchronized dialogue with lip-sync, sound effects matched to visible actions, and ambient or musical audio derived from the same prompt that drove the visuals. This was unprecedented at the time among major commercially deployed video models, and it changed the practical use cases for the family.
| Capability | Veo 2 | Veo 3 |
|---|---|---|
| Audio | None | Dialogue with lip-sync, SFX, ambient |
| Resolution | 720p delivered (4K technical) | 1080p delivered, 4K via upscale |
| Clip length | 8 sec (consumer), 120 sec (Vertex AI) | 8 sec |
| Physics realism | Improved over Veo 1 | Further improved |
| Pricing (Vertex AI) | $0.50/sec | $0.75/sec at launch (later $0.40/sec for video+audio) |
| Pricing (Gemini API) | $0.35/sec | $0.40/sec |
| Fast variant | Veo 2 Fast | Veo 3 Fast at $0.15/sec |
| Initial consumer access | Gemini Advanced (Pro tier) | Google AI Ultra ($249.99/month) |
Veo 2 was not deprecated when Veo 3 launched. Both models remained available in the Gemini API, on Vertex AI, and in consumer surfaces. Google AI Pro ($19.99/month) subscribers continued to have Veo 2 access by default, while Veo 3 was initially gated behind the higher-tier Google AI Ultra subscription. Over time, lower-cost Veo 3 access expanded to Pro subscribers, and a Veo 2 Fast variant was made available for cost-sensitive workflows. Google announced in late 2025 that both Veo 2 and Veo 3 would be retired by June 30, 2026, with Veo 3.1 as the recommended replacement.
The practical recommendation that emerged in the developer community was straightforward: use Veo 3 (or Veo 3.1) when the project required dialogue or sound design, and use Veo 2 when audio was unnecessary, when the lower per-second price mattered, or when the longer clip length on Vertex AI was useful. Veo 2 retained a meaningful niche specifically because not every video generation use case needed audio, and the Veo 2 Fast variant was priced low enough to be useful for high-volume background or b-roll generation.
The table below compares Veo 2 with the major competing video generation models that were available in late 2024 and the first half of 2025. Specifications are drawn from the developers' own documentation and from independent reviews, which means that some figures (especially for delivered as opposed to advertised capabilities) reflect the reviewer's testing conditions rather than a fixed model specification.
| Feature | Veo 2 | Sora Turbo (OpenAI) | Sora 2 (OpenAI) | Runway Gen-3 Alpha | Kling v1.5 | Hailuo Minimax |
|---|---|---|---|---|---|---|
| Developer | Google DeepMind | OpenAI | OpenAI | Runway | Kuaishou | Minimax |
| Released | Dec 16, 2024 | Dec 9, 2024 | Sept 30, 2025 | Jul 2024 | Sept 2024 | Aug 2024 |
| Resolution claim | Up to 4K | 1080p | 1080p | 1280x768 | 1080p | 1080p |
| Resolution delivered | 720p typical | 1080p | 1080p | 768p typical | 720p typical | 720p typical |
| Max clip length | 8 sec consumer / 120 sec API | 5-20 sec | up to 20 sec | 10 sec | 5-10 sec | 6 sec |
| Native audio | No | No | Yes | No | No (yes in v2.6+) | No |
| Image-to-video | Yes | Yes | Yes | Yes | Yes | Yes |
| Camera controls | Strong (lens, movement) | Limited | Limited | Advanced | Motion brush | Limited |
| Watermark | SynthID (invisible) | C2PA + visible | C2PA + visible | C2PA | Watermark on free tier | Watermark on free tier |
| Consumer pricing | Bundled in Google AI Pro ($19.99/mo) | ChatGPT Plus ($20/mo) | ChatGPT Plus | $12-95/mo | $10-67/mo | $5-45/mo |
| Developer pricing | $0.35-$0.50/sec | $0.10-$0.50/sec | $0.10-$0.50/sec | Credits | Credits | Credits |
The most direct competitor at launch was Sora Turbo, given the close timing of releases. Veo 2 had a clear edge on cinematography vocabulary and physics handling, while Sora Turbo had a longer maximum clip length (20 seconds versus 8 seconds in most consumer surfaces) and a more polished web experience inside ChatGPT. Runway Gen-3 had stronger interactive editing tools but smaller native resolution. Kling, deployed largely in the Chinese market, was significantly cheaper and had a strong following among non-English-language creators.