Runway Gen-4 is the fourth-generation video generation model developed by Runway (company), released on March 31, 2025. The model introduced a reference image conditioning system that allows characters, objects, and environments to remain visually consistent across separate video generations without any fine-tuning or retraining. Runway described Gen-4 as a step toward simulating real-world physics in synthetic video, with improved motion dynamics compared to its predecessor, Gen-3 Alpha.
Gen-4 can produce clips of 5 or 10 seconds at up to 720p resolution (with optional 4K upscaling on paid plans) at 24 frames per second. A faster companion model, Gen-4 Turbo, arrived days later in April 2025, offering approximately five times the generation speed at a lower credit cost. Developer API access for Gen-4 Turbo opened in April 2025, and Gen-4 Image became available via the API on May 16, 2025. An in-context video editing model called Aleph, closely related to the Gen-4 family, launched on July 25, 2025.
Runway (company) was founded in 2018 and became a prominent name in AI video generation with a series of incrementally more capable models.
Gen-1 appeared in February 2023 as a video-to-video system. It applied the style or composition of an image or text prompt to the structure of an input video rather than generating video from scratch. The model was practical for style transfer tasks but could not produce novel content without a source clip to work from.
Gen-2, released in June 2023, opened up text-to-video and image-to-video generation. Users could describe a scene in text alone and receive a short synthetic clip. Gen-2 attracted significant attention as one of the first accessible commercial text-to-video products, though output length was short and motion often appeared floaty or inconsistent across frames.
Gen-3 Alpha launched in June 2024 and represented the first model Runway trained on a purpose-built infrastructure for large-scale multimodal training. It could produce 10-second clips from text, images, or video inputs and showed improved understanding of 3D spatial relationships. A speed-optimized variant, Gen-3 Alpha Turbo, followed in August 2024, offering faster generation at a lower credit cost. Gen-3 Alpha Turbo became the default workhorse for most Runway users through the second half of 2024, but the model still struggled with maintaining consistent character appearance or object identity when the same subject needed to appear across multiple separately generated clips.
That cross-clip consistency gap was the central problem Gen-4 was designed to address.
Runway announced and released Gen-4 on March 31, 2025. The release was rolled out to individual and enterprise customers on the same day via the Runway web interface. TechCrunch covered the launch that day, describing the model as "one of the highest-fidelity AI-powered video generators yet" with a particular focus on its ability to keep characters and locations consistent across multiple separate generations.
The timing of the release coincided with a period of intense competition in AI video generation. OpenAI's Sora had been temporarily restricting new user signups due to capacity constraints, and Google had not yet publicly released Veo 3. Runway's launch positioned Gen-4 as the most accessible high-fidelity option for professional users at that moment.
The company, which is headquartered in New York, had by that point secured over $230 million in funding from investors including Nvidia and Google. At the time of Gen-4's release, Runway was reportedly targeting $300 million in annualized revenue for 2025 and pursuing a $4 billion valuation in a new funding round.
Gen-4 generates video clips in 5-second or 10-second durations at 24 frames per second. The default resolution is 1280 x 720 pixels (720p) in a 16:9 aspect ratio. Other aspect ratios are supported, including 9:16 (vertical), 1:1 (square), 4:3, 3:4, and 21:9 (ultrawide). Output files are delivered as MP4 (H.264) or GIF. Upscaling to 4K is available on Pro and Unlimited plans.
Unlike pure text-to-video models, Gen-4 is primarily an image-to-video system. Users supply an image as a starting frame and then describe the motion, camera movement, or scene transformation they want. Text prompts support subject motion descriptions, explicit camera movement instructions (pan, zoom, tilt, tracking), style descriptors, and speed or temporal cues such as slow motion. The requirement for an image input distinguishes Gen-4 from systems like Sora, which can produce video from text alone, though many users find image conditioning gives more predictable control over the output.
Runway described Gen-4 as representing "a significant milestone in the ability of visual generative models to simulate real-world physics." In practice, the model shows improved rendering of secondary motion effects: hair movement in wind, fabric draping and rippling, water surface dynamics, and environmental particle effects such as dust or smoke. These improvements over Gen-3 Alpha are noticeable in clips involving physical interaction between objects or characters and their environment.
That said, Gen-4 is not a physics simulator in any formal sense. It learns motion patterns from training data rather than from a physical model. Complex interactions, such as two people wrestling or liquid pouring with precise volumetric behavior, can produce artifacts. Fire and water effects in particular sometimes look artificial compared to live action footage. Intricate limb movements during high-speed action also remain a weak point, with occasional deformation in hands, feet, or faces under fast motion.
Gen-4 responds to camera movement language in text prompts with reasonable fidelity. Users can specify dolly moves, orbiting shots, Dutch angles, and tracking shots. The model supports multiple visual styles, including live-action realism, stop-motion, cel animation, VFX compositing styles, and cinematic film grain. These style parameters can be layered with reference images to produce consistent stylized output across a sequence of clips.
The reference image system is the capability that most differentiated Gen-4 from prior models at launch. Users can supply up to three reference images alongside a text prompt. The model extracts visual characteristics from those images, including facial identity, clothing details, body proportions, object shapes, surface textures, and environment style, and applies them as constraints on the generated output.
In practice this means a filmmaker can photograph or render a character once and then generate multiple clips of that same character in different locations, under different lighting, or performing different actions, without the character's appearance drifting between generations. Runway's research page described this as enabling "consistent characters across endless lighting conditions, locations and treatments" from a single reference image.
When multiple reference images are used, each image is assigned a label in the prompt (for example, "image_1", "image_2", "image_3") and the user's text describes how each reference should influence the output. One image might supply the character's face while another supplies the background environment or a specific prop. The labels allow the model to resolve which visual elements should come from which source.
The reference system does not require fine-tuning or any model retraining. Conditioning happens at inference time, which makes the workflow practical for production pipelines where subjects change from project to project. Earlier approaches to character consistency in video generation typically required per-subject fine-tuning passes that could take hours and were cost-prohibitive at scale.
Beyond character identity, the reference image conditioning system can transfer artistic styles. Uploading a frame from a film, a painting, or a graphic novel page as a reference, and then describing desired content in the text prompt, causes Gen-4 to produce output that adopts the visual language of the reference: its color palette, lighting approach, texture rendering, and compositional tendencies. This makes Gen-4 useful for producing stylistically coherent series of clips without manually specifying every visual parameter in text.
Gen-4 Turbo was released in April 2025, a few days after the standard Gen-4 model. It is a speed-optimized variant designed for rapid iteration and higher-volume production workflows.
The key difference between Gen-4 and Gen-4 Turbo is generation time. Gen-4 Turbo produces a 10-second clip in approximately 30 seconds, which is roughly five times faster than the standard Gen-4 model. This speed advantage makes Turbo practical for creative iteration, where a producer wants to try many different prompt variations quickly before committing to a final shot.
The cost difference is also significant. Gen-4 Turbo consumes 5 credits per second of video (50 credits for a full 10-second clip), compared to 12 credits per second for standard Gen-4. On Runway's Pro plan, which includes 2,250 monthly credits, a user can generate approximately 45 full 10-second clips using Gen-4 Turbo compared to around 18 clips with standard Gen-4.
Quality-wise, Gen-4 Turbo is slightly below the standard model in motion smoothness and fine detail, particularly in complex scenes. For most social media and marketing applications the difference is minimal, but for outputs intended for large screens or professional broadcast contexts the standard model remains the better option.
Gen-4 Turbo became available on the Runway API in April 2025. Runway noted at the time that it offered "the same price, scale and reliability of Gen-3 Alpha Turbo but with the state-of-the-art" quality of the Gen-4 generation.
Aleph is an in-context video editing model released by Runway on July 25, 2025. While Gen-4 and Gen-4 Turbo generate new video from image and text inputs, Aleph works on existing footage and transforms it based on text instructions.
Runway described Aleph as "a state-of-the-art in-context video model" capable of performing "a wide range of edits on an input video." Supported operations include adding objects to a scene, removing objects, transforming the appearance of objects already present, changing lighting and time-of-day, altering the visual style of footage, and generating alternate camera angles from a single shot.
The camera angle generation capability is technically interesting. Aleph can infer a scene's spatial geometry from a single video and then synthesize views that appear to come from a different camera position, such as a reverse angle or an aerial perspective, without any additional footage. This is achieved through depth estimation and spatial reconstruction rather than explicit 3D modeling.
When objects are added or removed, Aleph adjusts shadows, reflections, and lighting to match the scene's existing conditions automatically. The model maintains temporal consistency across the full edited clip rather than processing frames independently, which avoids the flickering artifacts that affected earlier video editing approaches.
Aleph is available to all Runway paid users and is priced at 15 credits per second of edited video via the API.
Runway's developer API gave programmatic access to Gen-4 Turbo in April 2025. The API allows applications to submit image and text inputs and receive generated video as output, enabling integration of Runway video generation into third-party products, platforms, and workflows without going through the Runway web interface.
Gen-4 Image, a related model capable of generating still images using the same reference conditioning system as Gen-4 video, became available on the Runway API on May 16, 2025. A faster variant, Gen-4 Image Turbo, followed on August 19, 2025, delivering 93.3 percent of standard Gen-4 Image quality at 2.5 to 4 times lower cost.
Credits purchased through the developer portal cost $0.01 each. Volume discounts apply for larger purchases (for example, 275,000 credits for $1,250). API billing is pay-as-you-go, meaning developers are not required to subscribe to a monthly plan and pay only for the credits they consume.
Runway also released an MCP (Model Context Protocol) server integration in June 2025, allowing the Runway API to be invoked directly from Claude and other AI assistant environments that support MCP.
Runway operates on a subscription credit model. As of 2025, the platform offers four consumer-facing tiers plus enterprise custom pricing.
| Plan | Monthly cost (annual billing) | Monthly credits | Gen-4 seconds | Gen-4 Turbo seconds |
|---|---|---|---|---|
| Free | $0 | 125 (one-time) | No Gen-4 access | No Gen-4 access |
| Standard | $12/user | 625 | ~52 seconds | ~125 seconds |
| Pro | $28/user | 2,250 | ~187 seconds | ~450 seconds |
| Unlimited | $76/user | 2,250 + unlimited explore | ~187 seconds (priority) | ~450 seconds (priority) |
| Enterprise | Custom | Custom | Custom | Custom |
The Unlimited plan's "unlimited explore" mode queues generations at a lower priority than paid-credit jobs, which can extend wait times during peak usage. The 2,250 monthly credits on the Unlimited plan can still be used for priority generations at the same rate as the Pro plan.
Credits do not roll over between billing periods. Gen-4 consumes 12 credits per second, Gen-4 Turbo consumes 5 credits per second, and Aleph consumes 15 credits per second via the API. 4K upscaling is available on Standard plans and above.
Via the developer API, credits cost $0.01 each and can be purchased in any quantity. A 10-second Gen-4 generation costs $1.20 in API credits; the same clip via Gen-4 Turbo costs $0.50.
At the time of Gen-4's release and through mid-2025, the primary competitors in high-fidelity AI video generation were OpenAI's Sora 2, Google's Veo 3, and Kuaishou's Kling 2.1.
| Feature | Runway Gen-4 | Sora 2 | Veo 3 | Kling 2.1 |
|---|---|---|---|---|
| Max clip length | 10 seconds | 10-25 seconds | 8 seconds | 10 seconds |
| Output resolution | 720p (4K upscale) | Up to 1080p | Up to 1080p | Up to 1080p |
| Native audio | No | Partial | Yes | Yes |
| Reference image conditioning | Yes (up to 3 images) | Yes (Cameos) | Limited | Limited |
| Character consistency | Strong | Strong | Moderate | Moderate |
| Physics realism | Good | Very good | Good | Very good |
| Generation speed | ~30s (Turbo) | 2-5 minutes | 1-3 minutes | Varies |
| Text-to-video (no image input) | Limited | Yes | Yes | Yes |
| Starting price | $12/month | $20/month (ChatGPT Plus) | $249.99/month (AI Ultra) | Free tier available |
| API access | Yes | Yes | Yes (Vertex AI) | Yes |
Runway Gen-4's clearest strength in this comparison is character consistency across multiple generations and generation speed via the Turbo variant. Sora 2 shows superior physics simulation, particularly for water, cloth, and rigid body dynamics. Veo 3 has native audio generation, meaning dialogue, sound effects, and ambient sound are produced along with the video rather than requiring separate audio work in post-production. Kling 2.1 offers competitive physics and is often considered the strongest image-to-video model for motion realism.
Runway Gen-4's lack of native audio was a notable gap relative to Veo 3 and Kling at the time of its release. Users who needed synchronized sound had to source audio separately and sync it in editing software. This gap was partially addressed when Runway added third-party audio tools and eventually native audio generation with later Gen-4.5 in December 2025.
Pricing favors Runway Gen-4 for many production workflows. Veo 3 requires a Google AI Ultra subscription at $249.99 per month, which is substantially more expensive than Runway's Pro tier at $28 per month. Sora 2 is bundled with ChatGPT Plus at $20 per month but with generation limits that can be restrictive for production use. Runway's credit-based model gives more predictable cost control for studios managing per-project budgets.
Gen-4 attracted adoption across several professional production contexts in 2025.
Pre-visualization (previz) is one of the most common applications. Directors and production designers use Gen-4 to quickly mock up shot ideas before committing to physical setups or expensive CGI. A reference image of a character combined with a text description of a camera move can produce a rough previz clip in under a minute, which is fast enough to show to a client or investor during a pitch meeting.
Advertising and branded content production benefit from the character consistency feature. A brand can photograph a human talent or product once, then generate multiple stylistically distinct clips of the same subject in different settings without scheduling additional shoots. This compresses the production timeline for campaigns requiring a large volume of visual variations.
Independent filmmaking has been a significant adoption driver. Low-budget productions have used Gen-4 to generate VFX shots, environment extensions, and concept shots that would otherwise require expensive CGI vendors. Runway hosted an AI Film Festival at Lincoln Center in 2025 that drew industry attention, showing short films produced primarily with AI generation tools including Gen-4.
Social media content creation, particularly for TikTok and Instagram Reels, benefits from Gen-4 Turbo's speed. Creators producing high-volume daily content can generate B-roll and stylized clips at a pace that matches platform publishing cadences.
Coverage at launch was largely positive about the character consistency capability, which reviewers recognized as a genuine step beyond Gen-3 Alpha. TechCrunch described the model as "impressive" in its March 31, 2025 report on the release. Workflow-oriented reviews praised the fact that consistency worked without any per-subject training, noting that previous methods required hours of fine-tuning that made consistent characters impractical for most users.
Reviewers also noted that Gen-4 set a new standard for cinematic output quality among accessible consumer-facing tools, with one analysis calling it the "benchmark for cinematic AI video quality in 2025" and recommending it first for any production workflow where quality is the primary concern.
Critical perspectives focused on several recurring issues. The 10-second clip limit is a genuine constraint for narrative filmmaking, which requires longer continuous shots. Stitching many 10-second clips together introduces consistency and pacing challenges that require additional editing work. The absence of native audio was noted as a competitive disadvantage relative to Veo 3. The credit cost for standard Gen-4 (12 credits per second) was described as expensive for exploratory or high-volume use, which is why many users shifted to Gen-4 Turbo.
Some users on Reddit and other communities reported that the model, while more consistent than Gen-3, still produces unexpected character appearance changes between clips when reference image conditioning is not applied carefully. Getting reliable results requires attention to reference image quality (clear, well-lit shots with unobstructed features perform better) and specific prompt phrasing.
The broader context of AI-generated video attracted criticism that applied to Runway alongside other companies in the space. Runway has declined to disclose the sources of its training data, citing competitive concerns. This has kept questions about the use of copyrighted film and video content in training data unresolved. A number of visual artists and filmmakers have raised concerns about the impact of generative video tools on employment in traditional production roles, and Runway's position as a company that also sponsors film festivals and markets to filmmakers as collaborators rather than competitors has been a point of tension.
Several limitations of Gen-4 were clear at launch and remained through its production lifetime.
Clip length tops out at 10 seconds. Longer sequences require chaining multiple clips together, which demands additional editing work and can produce consistency issues at clip boundaries despite the reference image system.
Gen-4 does not support text-only video generation in the conventional sense. An input image is required to anchor the generation. Users who want a scene they cannot easily find or create a reference image for face an extra preparatory step.
Audio is entirely absent from native Gen-4 output. All sound design, dialogue, and music must be added in post-production. Competitors Veo 3 and Kling produce synchronized audio natively, which gives them an advantage in workflows where time-to-completion with audio is important.
Physics simulation, while improved over Gen-3, falls behind Sora 2 and Kling in direct comparisons involving fire, fluid dynamics, and complex rigid body interactions. Fast-motion sequences involving hands, fingers, or feet can produce deformation artifacts.
The credit system means costs scale with usage in ways that can be hard to forecast for variable workloads. A production that requires many exploratory iterations to land on the right output can consume credits quickly. Monthly credits expire if unused, which penalizes users in months with lighter workloads.
Repeatability is limited: identical prompts and reference images do not guarantee identical outputs. This is inherent to diffusion-based generation but can frustrate users who need to reproduce a specific result.