Veo 2

Veo 2 is a text-to-video generative AI model developed by Google DeepMind, announced on December 16, 2024 alongside an update to the Imagen 3 image generation model. It is the second major iteration of the Veo family and was positioned by Google as a direct competitor to OpenAI's Sora, which had reached general availability one week earlier. Veo 2 generates short cinematic video clips from text prompts and image conditioning, with documented support for output up to 4K resolution and clip durations into the multi-minute range, although consumer surfaces typically deliver eight-second 720p clips at a 24 frame-per-second frame rate.[^1][^2]

The model arrived first as a limited preview inside VideoFX in Google Labs, then expanded through a staged rollout: a private preview on Vertex AI in December 2024, integration into YouTube Shorts Dream Screen on February 13, 2025, the addition of image-to-video animation through Whisk Animate on April 15, 2025, general availability in Gemini Advanced on the same date, GA on Vertex AI and the Gemini API in April 2025, and integration into Google Vids in May 2025.[^3][^4][^5] Veo 2 was succeeded by Veo 3 on May 20, 2025 at Google I/O and by Veo 3.1 on October 15, 2025, but it remained available across most surfaces as a lower-cost option (sometimes branded "Veo 2 Fast") until Google announced a planned Vertex AI retirement deadline of June 30, 2026, with migration recommended to veo-3.1-generate-001.[^6]

Infobox


Type	Text-to-video, image-to-video
Developer	Google DeepMind
Released	December 16, 2024 (VideoFX preview)
GA on Vertex AI	April 2025
Parent	Veo family
Predecessor	Veo 1 (May 2024)
Successor	Veo 3 (May 2025), Veo 3.1 (October 2025)
Latest variant	Veo 2 / Veo 2 Fast
Input modalities	Text, image (start frame)
Output	Video at up to 4K resolution, up to ~120 seconds (technical), 8 seconds (consumer surfaces)
Frame rate	24 fps
Architecture	Latent diffusion model transformer
Watermarking	SynthID (invisible)
Pricing (Vertex AI)	$0.50 per second of generated video
Pricing (Gemini API)	$0.35 per second of generated video
Retirement deadline	June 30, 2026 (Vertex AI, announced)

Background

Veo 1 and the May 2024 launch

The original Veo model, retrospectively called Veo 1, was unveiled by Sundar Pichai and Demis Hassabis at Google I/O on May 14, 2024. The keynote referenced artificial intelligence dozens of times over the course of about two hours, and Veo was positioned as Google's flagship answer to a category that had been defined for most of 2024 by OpenAI's Sora research preview from February 2024 and by Runway's Gen-2 and Gen-3 models. Google described Veo 1 as capable of generating 1080p clips that exceeded one minute in length, supporting text-to-video, image-to-video, and a basic storyboard mode for chaining prompts into longer sequences. Access in May 2024 was confined to a waitlist for VideoFX inside Google Labs, with select filmmakers receiving early invitations.

Veo 1 generated some attention but did not catch on as a viral consumer product. It was visibly behind the bar that Sora's research demos had set on prompt adherence and on the realism of human motion, and most of the publicly shared clips came from a handful of curated creators rather than from broader user traffic. The waitlist limited the volume of community-generated examples, which in turn limited the social-media presence of the model. Inside Google DeepMind, work on the next iteration began essentially immediately after I/O and continued through the summer and autumn of 2024.

The December 2024 competitive context

By December 2024, the video generation category had begun to consolidate around a small number of well-funded players. OpenAI moved Sora out of research preview on December 9, 2024 and began enrolling ChatGPT Plus and Pro users with the variant marketed as "Sora Turbo." Sora's ability to produce up to 20-second clips at 1080p, with a clean web interface and integration into the existing ChatGPT subscription, was framed by much of the press as a meaningful advance over previous video tooling.

Google DeepMind responded one week later. On December 16, 2024, a blog post co-authored by Aaron van den Oord (research scientist at Google DeepMind, previously of WaveNet and VQ-VAE) and Elias Roman (senior director of product management at Google Labs) announced Veo 2, an Imagen 3 update, and a new tool called Whisk that combined Imagen 3 with Gemini's visual understanding to remix images.[^1] Coverage of the announcement framed the timing explicitly as a response to Sora's launch the prior week, and outlets including Fortune, TechCrunch, The Verge, and TechRadar emphasized the 4K resolution claim and the head-to-head benchmark numbers Google published.[^7][^8]

Announcement and rollout timeline

Veo 2's rollout occurred in stages over roughly four months, expanding from an invite-only preview to consumer apps and developer APIs. The pace was unusually fast for a frontier video model, but it was not a single-day general availability launch.

Date	Event	Surface
December 16, 2024	Veo 2 announced; Imagen 3 update; Whisk introduced	Blog announcement
December 16, 2024	Limited preview opens	VideoFX (Google Labs)
December 2024	Private preview begins	Vertex AI
February 13, 2025	Veo 2 added to Dream Screen	YouTube Shorts (US, Canada, Australia, New Zealand)
April 9, 2025	API preview and pricing announced	Gemini API ($0.35/sec)
April 15, 2025	Veo 2 begins rolling out to Gemini Advanced (now Google AI Pro)	Gemini app (web and mobile, global)
April 15, 2025	Whisk Animate launched globally	Whisk in Google Labs
April 2025	General availability	Vertex AI
May 6, 2025	Veo 2 video generation in Vids	Google Workspace (selected editions)
May 20, 2025	Veo 3 announced; Veo 2 remains available	Google I/O 2025
October 15, 2025	Veo 3.1 launched; Veo 2 still available as legacy option	Gemini API, Vertex AI, Flow
June 30, 2026	Planned Vertex AI retirement deadline announced	Vertex AI, Gemini API

The Dream Screen integration in YouTube Shorts on February 13, 2025 was the first time Veo 2 reached anything close to a mass consumer audience. Initial availability was limited to four English-speaking countries, but it gave Veo 2 a meaningful presence inside a Google product that already saw over two billion logged-in monthly users.[^9]

The April 2025 rollout to Gemini Advanced placed Veo 2 inside the same paid consumer subscription that hosted Gemini chat. The price point was $19.99 per month (Google One AI Premium, later renamed Google AI Pro), and the implementation was deliberately conservative: eight-second clips at 720p in 16:9 landscape, returned as MP4 files. Google did not publish a formal monthly cap, only saying that users would be notified when they were close to the limit, though community testing suggested a practical cap on the order of dozens of clips per month.[^10][^11]

Capabilities

Veo 2's headline capabilities were framed by Google DeepMind in three categories: visual fidelity (resolution and detail), motion quality (physics, human motion, fluid dynamics), and creative control (camera language and prompt adherence).[^1][^2]

Resolution and duration

Google's December 2024 announcement claimed Veo 2 could produce video at "resolutions up to 4K" with durations "extended to minutes in length."[^1] In practice, those numbers represented the technical ceiling of the underlying model rather than the experience that most users encountered. Across consumer surfaces, the delivered output was significantly more constrained:

Surface	Resolution	Max duration	Format
VideoFX (Dec 2024)	720p	~8 seconds	MP4, 16:9
YouTube Shorts Dream Screen	Optimized for mobile vertical	~6 seconds (background)	Vertical
Gemini Advanced (April 2025)	720p	8 seconds	MP4, 16:9
Google Vids	720p	8 seconds	MP4, 16:9
Whisk Animate	720p	8 seconds	MP4
Gemini API	up to 720p	up to 8 seconds	MP4
Vertex AI (GA, April 2025)	up to 4K	up to ~120 seconds	MP4

The 4K and two-minute claims held only for enterprise customers using Vertex AI, and even there community testing in mid-2025 reported that native 4K output sometimes showed soft textures and unresolved fine detail, which suggested that the model was producing video at lower resolutions and upscaling for delivery in some configurations. The framerate was 24 frames per second, the standard for cinematic content, across every surface.

The gap between marketed and delivered capabilities became one of the more frequently raised criticisms of Veo 2 in the months after launch. It was not unique to Google, however, since both OpenAI's Sora Turbo and Runway Gen-3 had similar gaps between their marketed and delivered output. The pattern reflected the cost of inference for high-resolution, long-duration video generation more than any deception.

Camera language and creative control

Veo 2's most consistently praised capability was its ability to interpret cinematic terminology in prompts. The model understood lens descriptions ("18mm wide angle," "85mm portrait lens," "anamorphic"), depth of field specifications ("shallow depth of field," "deep focus"), camera movements ("low-angle tracking shot," "dolly in," "crane up," "orbit"), and lighting setups ("golden hour," "chiaroscuro," "neon-lit"). It also responded reasonably well to genre and style references such as "film noir," "documentary handheld," or "anime." The supported control parameters can be summarized as follows:

Control category	Examples
Lens	18mm wide angle, 35mm, 85mm portrait, fisheye, anamorphic, macro
Depth of field	Shallow, deep focus, rack focus
Camera movement	Dolly in/out, tracking shot, pan, tilt, crane up/down, orbit, static lock, Dutch angle, handheld
Shot type	Close-up, medium, wide, over-the-shoulder, low-angle, bird's eye, top-down
Lighting	Golden hour, blue hour, chiaroscuro, soft box, neon, backlight, silhouette
Style	Film noir, documentary, anime, photorealistic, claymation, cyberpunk
Conditioning	Text prompt, single image as start frame

Google's December announcement explicitly highlighted the lens-specification ability: a prompt that requested an "18mm lens" or "shallow depth of field" would be rendered with characteristics consistent with that physical lens, including the visual signature of a wide-angle field and the appropriate distortion of subjects near the edge of the frame.[^1] This kind of fine-grained vocabulary handling was rare among December 2024 video models, and it became one of the most discussed features of the launch.

The model also supported image-to-video conditioning, where a still image served as the first frame of the generated clip. This feature was available from launch on Vertex AI and was progressively added to other surfaces, with Whisk Animate making it broadly accessible to consumer subscribers in April 2025.[^11]

Physics, motion, and human anatomy

The most visible technical advance over Veo 1 was the model's understanding of physical interaction. Demos circulated in December 2024 included shots of liquid pouring into glassware with surface tension behavior, fabric draping over moving figures with plausible inertia, and small particles such as sand or snow responding to wind. Google emphasized this as "improved understanding of real-world physics and the nuances of human movement and expression" rather than a separate physics simulator, which was technically accurate: the behavior was an emergent property of training rather than a hard-coded engine.[^1]

Several specific demos circulated widely in the December 2024 announcement period and were cited repeatedly in coverage. A clip of a chef slicing a tomato — the knife moved cleanly through the fruit, the two halves separated naturally, and seeds remained where the cut surface was — became a recurring example of physical interaction that previous video generation models had visibly failed to handle.[^12] A clip of a beekeeper picking up a jar of honey while bees moved around the figure was used to demonstrate the model's ability to render a large number of small autonomous-looking entities in close proximity to a human subject without producing morphing or merging artifacts.[^12] A clip of a small dog on a flamingo pool float, captured with a dolly-zoom move, became a viral example of the camera-language interpretation in combination with subject motion and water physics.[^12]

Human motion also improved measurably. Veo 1 had struggled with characters walking through complex environments and with the synchronization of body movements during interactions, and Veo 2 produced more natural gait, more coherent expressions, and fewer blatant anatomical errors. Hands and fingers, however, remained the most visible weak spot. Reviewers consistently flagged occasional extra fingers, awkward grip poses, or fingers that appeared to float over surfaces they should have been pressing against (a piano keys example was widely cited). Text rendering on signs, posters, and product packaging also remained unreliable, a limitation common to essentially all video generation models in this period.

Benchmark results

Google conducted internal head-to-head comparisons using 1,003 prompts from the MovieGenBench dataset published by Meta earlier in 2024. Human evaluators rated 720p, eight-second clips produced by Veo 2 against output from Sora Turbo, Meta Movie Gen, Kling v1.5, and Hailuo's Minimax model. Veo 2 was reported as receiving the highest preference scores in both "overall preference" and "prompt adherence" categories.[^2]

The most-cited number from these benchmarks was the head-to-head against Sora Turbo: in pairwise comparisons, Veo 2 was preferred 58.8% of the time, Sora Turbo 26.7% of the time, with 14.5% of judges expressing no preference.[^13] These were Google's own benchmarks using Google's own methodology, with prompts drawn from a Meta-authored dataset, and they were not independently replicated. Outlets such as Fortune and The Decoder reported the results approvingly, with headlines describing Veo 2 as having "trounced" the competition.[^7][^13]

Architecture and training

Google DeepMind has not published a full technical paper for Veo 2. The architecture is described in DeepMind's blog posts, in the Veo developer documentation, and in a technical report focused on Veo 3 that retroactively confirmed key details about the Veo 2 design. The general architecture is a latent diffusion model transformer that uses a video autoencoder to compress raw frames into a lower-dimensional latent representation, tokenizes the latent into spatio-temporal patches, and runs a transformer network with self-attention to denoise the latent representation conditioned on a text prompt or image embedding.[^2][^14]

Latent diffusion approach

Video generation directly in pixel space is computationally prohibitive at any meaningful resolution and length. Veo 2, like most contemporary video models, addresses this by encoding training video into a latent space using a learned autoencoder, then training the diffusion model to operate within that compressed space. This is the same general approach that Stability AI's Stable Video Diffusion, OpenAI's Sora, and earlier Google research projects such as Imagen Video and Lumiere had used.

The forward training process adds Gaussian noise to clean latent representations across a series of timesteps. The model is trained to predict and remove that noise, conditioned on text embeddings (and, optionally, image embeddings). At inference time, the process runs in reverse: starting from random noise in the latent space, the model iteratively denoises toward a coherent video latent, then decodes that latent back to pixel frames.

The transformer backbone uses self-attention across both spatial dimensions (within a frame) and the temporal dimension (across frames), which gives the model the ability to maintain object identity over time. This is the major architectural difference from purely convolutional video diffusion models that came before, and it is why the prompt-adherence properties of Veo 2 are noticeably better than older diffusion-based video models.

Conditioning and prompt understanding

Google reported that one of the key engineering investments for Veo 2 was the use of richer captions during training. Rather than associating short, single-line text descriptions with each training clip, the team used multi-sentence captions that described visual style, subject, action, camera behavior, lighting, and atmosphere. This allowed the model to learn nuanced associations between text descriptors and visual outcomes, which is why Veo 2 responded usefully to specialized cinematic vocabulary. Image conditioning worked through a separate pathway in which a CLIP-style image embedding was injected as part of the conditioning signal alongside the text embedding.

Training data and the YouTube controversy

Google DeepMind has not publicly disclosed the full composition of Veo 2's training set. The most informative confirmation came in June 2025, when CNBC reported that Google had confirmed using a subset of YouTube videos to train its Veo and Gemini family models.[^15] Google declined to specify the size of the subset, but YouTube hosts on the order of 20 billion videos, so even a small percentage of the catalog would correspond to a training corpus much larger than what most other commercial video models have access to.

The disclosure caused significant pushback among YouTube creators, many of whom said they had not been informed and had no clear way to opt out of training. The standard YouTube terms of service grant Google a license to use uploaded content, but legal observers noted that the licensed uses described in the terms had historically been understood as referring to platform features rather than to training generative AI models that would compete commercially with the creators who supplied the data. Coverage in CNBC, PetaPixel, Slashdot, and Business Standard featured creators expressing concern about commercial competition without consent or compensation.[^15][^16]

The controversy was foreshadowed in Marques Brownlee's December 2024 social-media commentary on Veo 2, in which he pointed out that the difference between Veo 2 and competing models could plausibly be characterized as the difference between using a small portion of YouTube data and "owning YouTube and just using all of it."[^17] Brownlee did not have specific information about Veo 2's training set when he made the comment, but the framing accurately predicted what later reporting confirmed for Veo 3 and, by inference, for Veo 2.

Watermarking with SynthID

Every video generated by Veo 2 carries an invisible SynthID watermark embedded directly into the pixel data of every frame.[^18] SynthID was developed by Google DeepMind as a steganographic system that modifies the generated content in ways that are imperceptible to human viewers but detectable by automated tools using a corresponding model. The watermark is designed to survive common transformations such as resizing, cropping, JPEG and H.264 compression, and standard format conversions, although it is not designed to withstand sophisticated adversarial attacks.

The technology had been previously deployed for Google's image generation systems and for some text outputs from the Gemini family. Veo 2 represented one of the first large-scale deployments of SynthID for generated video. Google made detection tools available to selected partners and to journalists studying AI-generated media, but as of late 2025 a public-facing SynthID detector for end users was not generally available. This meant that even though the watermark was embedded reliably, ordinary social-media viewers had no practical way to verify that a clip they encountered was Veo 2 output without using Google-provided tooling.

Veo 2 outputs in the YouTube Shorts integration also carried YouTube's visible AI-content disclosure label, separate from the SynthID watermark. The combination of an invisible cryptographic watermark and a visible UI label was Google's response to broader concerns about deepfakes and misinformation in the run-up to a global cycle of elections in 2024 and 2025.

Availability across Google surfaces

Veo 2 was eventually available across most of Google's consumer, creator, and enterprise products. The matrix of where it ran, what it cost, and what it could produce varied considerably by surface.

Surface	Audience	Pricing	Output
VideoFX (Google Labs)	Limited preview, waitlist	Free with waitlist access	720p, 8s
YouTube Shorts Dream Screen	Creators in US, CA, AU, NZ at launch (expanded later)	Free	Vertical background or standalone clip
Gemini Advanced / Google AI Pro	Consumer subscribers, $19.99/month	Bundled	720p, 8s, monthly cap
Whisk Animate	Google AI Premium subscribers	Bundled (100 clips/month)	720p, 8s
Google Vids	Workspace customers (selected editions)	Bundled in Vids	720p, 8s, 16:9
Gemini API	Developers via Google AI Studio	$0.35/sec	up to 720p, 8s
Vertex AI	Enterprise developers	$0.50/sec	up to 4K, up to 120s

The split between $0.35 per second on the Gemini API and $0.50 per second on Vertex AI reflected the segmentation that Google has used historically: developer-friendly self-serve pricing on the Gemini API for AI Studio, and enterprise-grade pricing on Vertex AI that came with Google Cloud's compliance, IAM, regional deployment, and SLA guarantees. The 30% price difference was significant enough that some smaller commercial users routed video generation through the Gemini API rather than through Vertex AI.

Pricing

Veo 2's pricing was published in three places: developer documentation for the Gemini API, the Vertex AI generative AI pricing page, and the consumer subscription pages for Google AI Pro and Google AI Ultra.

Tier	Platform	Rate	Notes
Vertex AI	Veo 2 (`veo-2.0-generate-001`)	$0.50 per second of generated video	Charges only on successful generation; enterprise volume discounts negotiable
Gemini API	Veo 2	$0.35 per second of generated video	No free tier; charges only on successful generation
Whisk Animate	Bundled	Included with Google AI Premium ($19.99/month); 100 clips per month	Uses image-to-video conditioning
Google AI Pro consumer subscription	Bundled	$19.99/month	Veo 2 in Gemini app, monthly cap
Google AI Ultra consumer subscription	Bundled	$249.99/month	Higher Veo 2 access; later expanded to Veo 3 and Veo 3.1
Vids	Bundled	Included with eligible Workspace editions	No separate per-clip charge

At $0.50 per second on Vertex AI, an eight-second clip cost $4.00 and a one-minute clip cost $30. Generating an hour of video cost $1,800. These prices were considerably higher than the per-second cost of competing models such as Kling and Hailuo's Minimax, but the comparison was complicated by differences in delivered resolution, generation success rate, and quality. On the Gemini API, the same eight-second clip cost $2.80, which made the Gemini API the more economical surface for developers building applications, especially for short-form content where the 720p output was acceptable.

For consumer subscribers, the bundled approach was much more cost-effective. Google AI Pro at $19.99 per month included an unspecified monthly cap of Veo 2 generations alongside the rest of the Gemini Advanced feature set. The Whisk Animate variant of the same subscription was documented as 100 video generations per month, with the quota refreshing monthly but not accumulating.[^11] Estimates from creators experimenting with the Gemini-app feature suggested the practical cap there was on the order of dozens to low hundreds of clips per month, sufficient for personal experimentation but not for high-volume commercial work.

Reception

Veo 2 received broadly positive critical reception, particularly relative to the bar that Sora Turbo had set the prior week. The press treated the December 16, 2024 announcement as a credible counter to Sora rather than as a vaporware response.[^7][^8][^19]

Marques Brownlee, in his December 17, 2024 social-media post, wrote that "Google's new video generation model is called Veo 2, and if these hand-picked examples are real, they look better than anything I've gotten out of SORA."[^17] He followed up with a YouTube Short in which he pointed out that Sora often gave away its AI nature through implausible motion and physics, while Veo 2 looked more grounded. Brownlee's caveat about "hand-picked examples" was widely echoed in subsequent coverage: until the model was available to a broader user base, it was difficult to evaluate how representative the demo reels were.

Fortune's coverage went further, headlining its piece "Google DeepMind's new Veo 2 AI video generator trounces OpenAI's Sora with 4K resolution."[^7] The Decoder reported on the MovieGenBench results in similar terms.[^13] TechRadar, TechCrunch, and SiliconANGLE covered the announcement as a meaningful technical advance.[^19][^20] Tom's Guide called Veo 2 "one of the best AI video models I've ever seen."[^21]

More measured assessments came from outlets that focused on enterprise use. TechTarget's coverage was titled "Google's Veo 2 is technically advanced, but concerns remain," and emphasized open questions about training data, watermark robustness, and the gap between marketed 4K and the 720p that most users actually saw.[^22] The Verge's coverage of the announcement noted the resolution claim and the cinematography vocabulary as the headline features, but reserved judgment on quality until the model was more widely available.

When the Gemini Advanced rollout in April 2025 made Veo 2 accessible to paying subscribers, community reception was generally positive but with strong opinions about the eight-second cap. Reddit and Hacker News discussions raised the cap as the most frustrating constraint, with creators noting that interesting narrative ideas often required at least a few clips chained together. The lack of a longer-form mode in the Gemini app (in contrast to the multi-minute capability available on Vertex AI) was a frequent point of complaint.

Notable uses

By mid-2025, Veo 2 had been used in several visible production contexts beyond casual social-media experimentation.

Envato launched a feature called VideoGen on April 2, 2025, built directly on Veo 2 via Vertex AI, which gave Envato's creator user base access to text-to-video and image-to-video generation as part of their existing workflow. This was one of the first large third-party integrations of Veo 2 outside Google's own products.[^23]

Google Vids' Veo 2 integration in May 2025 brought Veo 2 generation into the Google Workspace presentation and explainer-video tool. Workspace customers could generate eight-second 720p clips inline in the Vids editor and insert them into longer videos without leaving the application. The integration was free for eligible Workspace editions, which meant that Veo 2 effectively reached the broader enterprise market via the Vids product without separate billing.[^24]

Independent creators and short-form filmmakers used Veo 2 for music videos, advertising concept work, social-media campaign assets, and short fiction. Several music artists released music videos that combined Veo 2 clips with traditional editing in early 2025. The aesthetic of the model, which tended toward photorealistic cinematography with a slight sheen of perfect lighting, became identifiable enough that some creators leaned into it deliberately.

The advertising sector experimented heavily, particularly for concept reels and storyboard equivalents. The eight-second clip length was actually well-matched to short-form social advertising placements (Instagram and TikTok engagement metrics frequently favored sub-ten-second clips), which made Veo 2 a practical tool for quickly producing draft ad creative.

Limitations

Veo 2's documented limitations clustered around the same set of issues that affected the broader video generation category in late 2024 and early 2025.

The most visible was the eight-second clip length cap on consumer surfaces. The technical model could produce up to two minutes on Vertex AI, but that capability was not exposed to consumer users through any first-party Google product, and stitching multiple clips together without visible discontinuities was difficult in practice. Veo 3.1 later addressed this with explicit scene extension tooling, but Veo 2 did not have that feature.

Hands and fingers continued to produce visible artifacts in a meaningful percentage of generations. Anatomically incorrect grips, extra digits, and fingers that did not properly contact surfaces were common enough that creators routinely generated multiple variants and selected the best. Text rendering on objects in the scene (signs, books, screens) was unreliable, with text often appearing as scrambled or incorrectly spelled glyphs.

Physics, while improved over Veo 1, had clear failure modes. Complex fluid dynamics (especially water mixing, turbulent flows, or interactions between liquids and other objects), cloth and hair simulation in fast motion, and intricate mechanical interactions (gears meshing, articulated machinery) produced inconsistent results. The model occasionally generated objects that floated against gravity, lighting that did not match material properties, or motion blur that contradicted the apparent direction of movement.

Long scenes with multiple characters and dialogue cues did not work reliably because Veo 2 had no audio capability. Lip-sync was therefore impossible, which meant that any clip involving characters speaking required manual audio post-production. This limitation was the central feature gap that Veo 3 addressed in May 2025.

Geographic availability was uneven. Initial Gemini Advanced rollout in April 2025 was global, but the YouTube Shorts Dream Screen integration was confined to four countries at launch. Some content policies restricted generation in certain regions, particularly around image-to-video features that Google viewed as carrying higher misuse risk.

Content policies were enforced strictly. Veo 2 refused to generate content involving minors in any context, real public figures, sexually explicit material, graphic violence, or other categories defined by Google's Generative AI Prohibited Use Policy. Refusals were sometimes triggered by ambiguous prompts that did not actually violate policy, which was a common complaint among professional users.

Veo 1 vs Veo 2 comparison

The table below summarizes the practical differences between the two models, drawing on Google's December 2024 announcement and on subsequent developer documentation.

Capability	Veo 1 (May 2024)	Veo 2 (December 2024)
Resolution claim	1080p	Up to 4K
Resolution delivered (consumer)	720p in VideoFX	720p in VideoFX, Gemini, Vids
Maximum duration (technical)	60+ seconds	~120 seconds
Maximum duration (consumer)	~6-8 seconds	8 seconds
Frame rate	24 fps	24 fps
Physics understanding	Basic; visible artifacts on fluid dynamics	Notably improved fluid, cloth, light
Human motion	Limited; gait often unnatural	Improved gait, expressions, body movement
Hands and fingers	Frequent severe artifacts	Reduced but still common
Camera control	Moderate; limited cinematic vocabulary	Strong; lens, depth of field, movement, lighting all supported
Image-to-video	Yes	Yes, with improved adherence to start frame
Storyboard/scene chaining	Yes (basic)	Yes (improved)
Audio output	None	None
Watermarking	SynthID	SynthID
Initial availability	VideoFX waitlist	VideoFX waitlist, then YouTube Shorts, Vertex AI, Gemini Advanced, Vids
Primary surface for consumers	VideoFX (waitlist)	Gemini Advanced ($19.99/month)
Pricing on Vertex AI	Initial preview pricing	$0.50 per second of generated video

Successors: Veo 3 and Veo 3.1

Veo 3 (May 2025)

Veo 3 was announced on May 20, 2025 at Google I/O 2025 and represented the next major iteration after Veo 2. The defining new capability was native audio generation: Veo 3 produced synchronized dialogue with lip-sync, sound effects matched to visible actions, and ambient or musical audio derived from the same prompt that drove the visuals. This was unprecedented at the time among major commercially deployed video models, and it changed the practical use cases for the family.

Capability	Veo 2	Veo 3
Audio	None	Dialogue with lip-sync, SFX, ambient
Resolution	720p delivered (4K technical)	1080p delivered, 4K via upscale
Clip length	8 sec (consumer), 120 sec (Vertex AI)	8 sec
Physics realism	Improved over Veo 1	Further improved
Pricing (Vertex AI)	$0.50/sec	$0.75/sec at launch (later $0.40/sec for video+audio)
Pricing (Gemini API)	$0.35/sec	$0.40/sec
Fast variant	Veo 2 Fast	Veo 3 Fast at $0.15/sec
Initial consumer access	Gemini Advanced (Pro tier)	Google AI Ultra ($249.99/month)

Veo 2 was not deprecated when Veo 3 launched. Both models remained available in the Gemini API, on Vertex AI, and in consumer surfaces. Google AI Pro ($19.99/month) subscribers continued to have Veo 2 access by default, while Veo 3 was initially gated behind the higher-tier Google AI Ultra subscription. Over time, lower-cost Veo 3 access expanded to Pro subscribers, and a Veo 2 Fast variant was made available for cost-sensitive workflows.

Veo 3.1 (October 2025)

Veo 3.1 was launched on October 15, 2025, alongside updates to the Flow filmmaking environment and the broader Gemini API.[^25] The release was framed as a quality and control refresh on the Veo 3 base rather than a wholesale architectural redesign. Key additions relevant to a comparison with Veo 2 included reference-image conditioning ("ingredients to video," in which up to three reference images of characters, objects, or settings could be combined into a single prompt), explicit first-and-last-frame transitions for image-to-video, scene extension that enabled continuous sequences beyond one minute by chaining segments without visible discontinuity, and richer native audio across both dialogue and sound design.[^25][^26]

For Veo 2 users, the practical implication of Veo 3.1 was that the new model was clearly recommended for any task that needed audio, scene extension beyond eight seconds, or character/object consistency across clips. The trade-off was per-second cost: Veo 3.1 was priced at $0.40 per second on the Gemini API and roughly similar on Vertex AI when generated with audio, compared to $0.35 per second for Veo 2 on the Gemini API.[^25][^26] Veo 2 retained a niche for high-volume, audio-free generation where the lower per-second price mattered, particularly for background or b-roll content, and the Veo 2 Fast variant was still cheaper than either Veo 3 or Veo 3.1.

The practical recommendation that emerged in the developer community across 2025 was straightforward: use Veo 3.1 (or Veo 3) when the project required dialogue, sound design, or scene extension, and use Veo 2 when audio was unnecessary, when the lower per-second price mattered, or when the long-duration ceiling on Vertex AI was useful. Veo 2 retained a meaningful niche specifically because not every video generation use case needed audio, and the Veo 2 Fast variant was priced low enough to be useful for high-volume background or b-roll generation.

Planned retirement

Google announced in late 2025 that the Veo 2 family on Vertex AI would be retired by June 30, 2026, with new access blocked one month before that date.[^6] The recommended migration target was veo-3.1-generate-001. The deprecation list also covered older preview endpoints for Veo 2 and certain experimental versions. As of mid-2026, Veo 2 was still callable on both Vertex AI and the Gemini API for projects that had not yet migrated, but Google had announced no plans to extend the deadline.[^6]

Comparison with peer models

The table below compares Veo 2 with the major competing video generation models that were available in late 2024 and the first half of 2025. Specifications are drawn from the developers' own documentation and from independent reviews, which means that some figures (especially for delivered as opposed to advertised capabilities) reflect the reviewer's testing conditions rather than a fixed model specification.

Feature	Veo 2	Sora Turbo (OpenAI)	Sora 2 (OpenAI)	Runway Gen-3 Alpha	Runway Gen-4	Kling v1.5	Hailuo Minimax
Developer	Google DeepMind	OpenAI	OpenAI	Runway	Runway	Kuaishou	Minimax
Released	Dec 16, 2024	Dec 9, 2024	Sept 30, 2025	Jul 2024	Mar 2025	Sept 2024	Aug 2024
Resolution claim	Up to 4K	1080p	1080p	1280x768	1080p	1080p	1080p
Resolution delivered	720p typical	1080p	1080p	768p typical	1080p typical	720p typical	720p typical
Max clip length	8 sec consumer / 120 sec API	5-20 sec	up to 20 sec	10 sec	10 sec	5-10 sec	6 sec
Native audio	No	No	Yes	No	No	No (yes in v2.6+)	No
Image-to-video	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Camera controls	Strong (lens, movement)	Limited	Limited	Advanced	Advanced	Motion brush	Limited
Watermark	SynthID (invisible)	C2PA + visible	C2PA + visible	C2PA	C2PA	Watermark on free tier	Watermark on free tier
Consumer pricing	Bundled in Google AI Pro ($19.99/mo)	ChatGPT Plus ($20/mo)	ChatGPT Plus	$12-95/mo	$12-95/mo	$10-67/mo	$5-45/mo
Developer pricing	$0.35-$0.50/sec	$0.10-$0.50/sec	$0.10-$0.50/sec	Credits	Credits	Credits	Credits

The most direct competitor at launch was Sora Turbo, given the close timing of releases. Veo 2 had a clear edge on cinematography vocabulary and physics handling, while Sora Turbo had a longer maximum clip length (20 seconds versus 8 seconds in most consumer surfaces) and a more polished web experience inside ChatGPT. Runway Gen-3 had stronger interactive editing tools but smaller native resolution. Runway Gen-4, released in March 2025, narrowed the quality gap with Veo 2 considerably and added stronger character-consistency features. Kling, deployed largely in the Chinese market, was significantly cheaper and had a strong following among non-English-language creators.

Meta's Movie Gen, announced in October 2024, was a closer architectural peer to Veo 2 (latent diffusion transformer trained on long captions), but Meta did not commercialize Movie Gen and instead used it for internal research and as a benchmark, which is why MovieGenBench (the prompt set used in Google's benchmarks) shared the name.

Legacy and current status

In hindsight, Veo 2's significance lay less in any single capability and more in the speed with which it converged the consumer, creator, and enterprise distribution surfaces for AI-generated video. Before Veo 2, video generation was overwhelmingly a research or specialist tool, gated behind waitlists or specialized creative subscriptions. By the end of April 2025, four and a half months after launch, Veo 2 was simultaneously available inside Google's largest creator product (YouTube Shorts), Google's flagship consumer AI subscription (Gemini Advanced / Google AI Pro), the Workspace productivity suite (Vids), the developer self-serve channel (Gemini API), and the enterprise cloud (Vertex AI). No other AI video model had been deployed across that many surfaces in that time.

The 58.8% preference figure from the MovieGenBench head-to-head was the most-shared single number from the Veo 2 launch, and it set a template that competitors followed throughout 2025. After Veo 2, the standard format for an AI video model announcement included a chart of pairwise preference scores against the most recent OpenAI release, and the rhetorical positioning of "trouncing Sora" became formulaic. Whether or not Google's specific 58.8% figure replicated under independent testing, it succeeded in framing the December 2024 launch as a meaningful competitive event rather than an iterative update.

Veo 2 also normalized the SynthID-watermarked-by-default approach for generative video. By the time Veo 3 and Veo 3.1 launched, the existence of an invisible watermark on every frame had become an unremarked-on background expectation rather than a marketed feature, which represented a meaningful shift in industry practice over the course of roughly twelve months.

As of mid-2026, Veo 2 remained callable on both Vertex AI and the Gemini API for projects that had not yet migrated to Veo 3 or Veo 3.1, and the Veo 2 Fast variant continued to fill a low-cost niche for high-volume use cases. The June 30, 2026 retirement deadline on Vertex AI was the primary reason for the ongoing migration of production workloads to Veo 3.1, which had become the recommended model for new projects.

References

Infobox

Background

Veo 1 and the May 2024 launch

The December 2024 competitive context

Announcement and rollout timeline

Capabilities

Resolution and duration

Camera language and creative control

Physics, motion, and human anatomy

Benchmark results

Architecture and training

Latent diffusion approach

Conditioning and prompt understanding

Training data and the YouTube controversy

Watermarking with SynthID

Availability across Google surfaces

Pricing

Reception

Notable uses

Limitations

Veo 1 vs Veo 2 comparison

Successors: Veo 3 and Veo 3.1

Veo 3 (May 2025)

Veo 3.1 (October 2025)

Planned retirement

Comparison with peer models

Legacy and current status

See also

References

Improve this article

Related Articles

Runway Gen-3 Alpha

NVIDIA Picasso

Text-to-video generation

Veo

ERQA

SmolVLA

Infobox

Background

Veo 1 and the May 2024 launch

The December 2024 competitive context

Announcement and rollout timeline

Capabilities

Resolution and duration

Camera language and creative control

Physics, motion, and human anatomy

Benchmark results

Architecture and training

Latent diffusion approach

Conditioning and prompt understanding

Training data and the YouTube controversy

Watermarking with SynthID

Availability across Google surfaces

Pricing

Reception

Notable uses

Limitations

Veo 1 vs Veo 2 comparison

Successors: Veo 3 and Veo 3.1

Veo 3 (May 2025)

Veo 3.1 (October 2025)

Planned retirement

Comparison with peer models

Legacy and current status

See also

References

Related Articles

Runway Gen-3 Alpha

NVIDIA Picasso

Text-to-video generation

Veo

ERQA

SmolVLA