Veo 3.1
Last reviewed
Jun 3, 2026
Sources
6 citations
Review status
Source-backed
Revision
v1 · 1,221 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
6 citations
Review status
Source-backed
Revision
v1 · 1,221 words
Add missing citations, update stale details, or suggest a clearer explanation.
Veo 3.1 is a video generation model released by Google DeepMind on October 15, 2025, as an incremental update to Veo 3. It keeps the headline capability that defined its predecessor, native generation of synchronized audio alongside video, and layers on richer sound, better prompt adherence, higher image-to-video fidelity, and a set of creative controls aimed at filmmakers and developers. The update was distributed simultaneously across Google's consumer and developer surfaces: the Flow filmmaking tool, the Gemini app, the Gemini API, and Vertex AI [1][2][3]. As with every model in the family, Veo 3.1 outputs carry a SynthID watermark [3].
Google's video work runs through three earlier releases. The original Veo arrived at Google I/O in May 2024, followed by Veo 2 in December 2024, which pushed resolution and motion quality. Veo 3, announced at I/O in May 2025, was the step change: it generated audio natively, including dialogue, sound effects, and ambient noise tied to the visuals. Google DeepMind chief executive Demis Hassabis framed the moment by saying the field was "emerging from the silent era of video generation" [4][5].
Veo 3 became the engine behind Flow, Google's web-based tool for stitching AI clips into longer sequences. By the time Veo 3.1 shipped, Google said Flow users had generated more than 275 million videos since the tool launched in May 2025 [1]. Veo 3.1 was positioned not as a clean-sheet successor but as a refinement of that same model, sharing its core audio architecture while improving fidelity and adding production controls.
The most consistent theme across Google's announcements is that audio was extended to capabilities that previously produced silent output. At launch Google said it was, "for the first time," bringing audio to "Ingredients to Video," "Frames to Video," and "Extend" [2]. Alongside that, the company described "stronger prompt adherence and improved audiovisual quality when turning images into videos" [2].
The feature set breaks down as follows:
| Feature | What it does |
|---|---|
| Ingredients to Video | Combines multiple reference images to control characters, objects, and style, synthesizing them so identity and background stay consistent across shots [2][3] |
| Frames to Video (first and last frame) | Generates a transition between a supplied start image and end image, now with audio [2] |
| Extend (scene extension) | Continues a clip into a longer sequence, generating each new segment from the final second of the previous one, allowing videos of a minute or more [2] |
| Insert | Adds an object or element into an existing scene, with the model handling shadows and lighting so it blends in [1][2] |
| Remove | Erases unwanted objects and reconstructs the background behind them; announced as coming after launch [1][2] |
Google also emphasized finer narrative direction, the ability to specify what should happen at particular moments inside a clip rather than only at the start [1]. On the developer side, the Gemini API version of the update foregrounded three improvements: enhanced Ingredients to Video that "intelligently synthesizes your inputs to preserve character identity and background details," native 9:16 vertical generation built for mobile rather than cropped from landscape, and sharper 1080p plus 4K output [3].
The object-removal tool was the main feature not available on day one. Google described it as forthcoming in Flow at the time of the announcement [1][2].
Veo 3.1 launched across four surfaces at once. For consumers and creators, it appeared in Flow and in the Gemini app; for developers and enterprises, in the Gemini API (where it carries the model identifier veo-3.1-generate-preview) and in Vertex AI [1][2][3]. The Gemini app rollout went to Google AI Pro and Ultra subscribers first as part of the October "Gemini Drop," with broader access following [6].
The technical envelope is well documented in the Gemini API specification. Veo 3.1 produces clips of 4, 6, or 8 seconds; 8-second output is required for 1080p, 4K, or when reference images are used. It defaults to 720p with 1080p and 4K available, and supports 16:9 (default) and 9:16 aspect ratios. Audio is "always on," generated natively with the video [3].
| Surface | Audience | Notes |
|---|---|---|
| Flow | Filmmakers, creators | Full feature set; object removal arrived later |
| Gemini app | Consumers | Rolled out to AI Pro and Ultra subscribers first [6] |
| Gemini API | Developers | Model IDs veo-3.1-generate-preview and veo-3.1-fast-generate-preview [3] |
| Vertex AI | Enterprises | Scene extension noted as coming after launch [2] |
Google stated that these capabilities, "along with our SynthID digital watermark, are available today in the Gemini API and Vertex AI for enterprises" [3]. SynthID embeds an imperceptible signal into the generated pixels that Google's detection tooling can later read, the same provenance mechanism used across earlier Veo models [3].
Google launched a faster, lower-cost companion model, Veo 3.1 Fast, carrying the Gemini API identifier veo-3.1-fast-generate-preview [3]. In the developer announcement the company said Veo 3.1 Fast would be "available soon, offering a faster and more cost-effective option for video creation." Google describes the Fast tier as letting "developers create videos with sound while maintaining high quality and optimizing for speed and business use cases," and points to uses such as rapid A/B testing of creative concepts and apps that need to quickly produce social media content [3]. The Fast model retains native audio rather than dropping it for speed.
Several third-party API resellers and pricing trackers have published per-second and per-video figures for the Fast tier, but Google's framing at launch was qualitative: faster and cheaper than the standard model, aimed at high-volume and iterative workloads.
Coverage treated Veo 3.1 as an iterative but meaningful update rather than a generational leap. TechCrunch summarized it plainly, writing that "Veo 3.1 builds on May's Veo 3 release and generates more realistic clips and adheres to prompts better," and highlighted the editing additions and the extension of audio across Flow's features [1]. The release landed during a period of intense competition in generative video, following OpenAI's Sora work and a broader race among labs to pair convincing motion with synchronized sound.
The practical emphasis in commentary fell on continuity and control: Ingredients to Video for keeping a character or product consistent across shots, scene extension for getting past the short native clip length, and the move to bring audio to image-driven workflows. For developers, the addition of native vertical video and higher-resolution output in the Gemini API was read as a signal that Google was targeting production use rather than demos [3]. Google has continued to iterate on the model since release, including later updates to its vertical-video and reference-image features.