Runway Act-Two
Last reviewed
May 16, 2026
Sources
28 citations
Review status
Source-backed
Revision
v1 ยท 3,985 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
28 citations
Review status
Source-backed
Revision
v1 ยท 3,985 words
Add missing citations, update stale details, or suggest a clearer explanation.
Runway Act-Two is a generative motion capture and character animation model developed by Runway, publicly introduced on July 14, 2025. The model accepts a driving performance video and a separate reference image (or short clip) of a character, and returns an animation in which the character reproduces the head, face, body, and hand motion of the original performer. Act-Two extends the Act-One model that Runway released in October 2024, which had been limited to facial expression transfer. The new model added full body and hand tracking and improved generation quality, while keeping the original workflow: no rigging, no markers, no studio capture setup.
Act-Two launched first for Runway Enterprise customers and Creative Partners, and rolled out to all paid Runway plans within a few days of the announcement. The model was added to the Runway API on July 21, 2025, at a rate of 5 credits per second of generated video, with a minimum charge of 15 credits per request. Industry coverage at launch focused on the model's potential to compete with traditional optical motion capture in pre-visualization, indie animation, and certain post-production tasks, with several Hollywood publications describing it as a meaningful step toward democratized performance capture.
Act-Two belongs to a small category of generative models that drive a character animation from a recorded human performance rather than from a text prompt. The model takes two inputs. The first is a driving performance video showing a person acting, speaking, or moving. The second is a character reference, typically an image, but optionally a short video, of the target character that should be animated. The model then synthesizes a new video in which the target character moves and emotes according to the driving performance.
Unlike traditional motion capture, Act-Two does not require markers, suits, depth sensors, or a calibrated camera rig. The driving video can be recorded on a smartphone or webcam. The model handles the entire transfer pipeline internally: it analyzes head pose, facial micro-expressions, body posture, hand gestures, and (in cases where a video reference is supplied) environmental motion, and applies them to the reference character with what Runway describes as preservation of identity and style.
The model is positioned as a production aid rather than a consumer toy. The pricing, the API rollout, and the early access pattern (Enterprise first, then paid plans, then API) reflect a strategy aimed at studios, advertising agencies, game cinematic teams, and independent filmmakers who would otherwise pay for live action mocap or hire animators.
Runway is a New York based generative AI company founded in 2018 by Cristobal Valenzuela, Alejandro Matamala, and Anastasis Germanidis. The company built its early reputation on browser based creative tools and on its co-authorship of the latent diffusion paper that underpinned Stable Diffusion. From 2022 onward, Runway focused on text to video and image to video generation, releasing Gen-1 in February 2023, Gen-2 in March 2023, Gen-3 Alpha in June 2024, and Runway Gen-4 in March 2025.
Act-One was the company's first motion driven animation feature. It was released on October 25, 2024 and built on top of the Gen-3 Alpha model. Act-One generated character performances from a single driving video and a character image, but the transfer was limited to facial movement: eye lines, micro-expressions, lip motion, and head turns. Body posture and hand movement were inferred but not directly captured from the input, which restricted Act-One to dialogue heavy shots, head and shoulders compositions, and similarly framed work. Runway's announcement post described Act-One as a way to capture performance without rigging or motion capture equipment, and emphasized that the same input video could drive characters in radically different visual styles.
The Act-One release attracted attention from filmmakers and creative directors because the workflow was an order of magnitude simpler than studio mocap. VentureBeat called the feature a game changer, and several industry observers noted that the consistency of facial expression transfer made it usable for short narrative work rather than only for novelty clips. The model also drew skepticism, particularly around the limits imposed by face only tracking and the absence of explicit body and hand control.
Act-Two was developed over the following nine months and was released on July 14, 2025. Runway framed Act-Two as a next generation motion capture model and emphasized two main improvements over Act-One: full body and hand tracking, and higher overall generation quality. The announcement also flagged automatic environmental motion when the character reference is an image rather than a video, meaning the surrounding scene gains natural background motion (foliage, lighting variation, camera drift) during generation rather than remaining static.
Act-Two performs end to end performance transfer in a single pass. The model expects a driving performance video and a character reference, and outputs a generated clip at 24 frames per second. The Runway help documentation places the typical workflow well under 30 seconds of driving footage, with a 3 second minimum that establishes a 15 credit floor for any request. Output runs at 1080p in a 16:9 aspect ratio, with extended durations up to 10 seconds per generation depending on the input length.
The model handles several distinct tracking surfaces at once.
| Capability | Description |
|---|---|
| Head tracking | Pitch, yaw, and roll of the head, plus position relative to the body and the camera. |
| Facial expression transfer | Micro-expressions, eye direction, eyelid behavior, brow movement, and lip motion synchronized to speech in the driving clip. |
| Body pose | Full body posture, shoulder and torso orientation, and lower body posture where visible in the input. |
| Hand tracking | Gesture, finger position, and grip approximations, supporting expressive hand work that Act-One could not produce. |
| Environmental motion | When the character reference is a still image, Act-Two adds natural background motion to the generated clip rather than leaving the surroundings frozen. |
| Multi-character dialogue | A separate workflow generates multiple characters speaking in turn within a single scene, with the help documentation describing dialogue clips kept under roughly 30 seconds. |
Runway has stated that the model preserves the identity and visual style of the character reference. A reference drawn in a stylized illustration register produces output that retains the original line work, color, and proportions, while a photorealistic reference produces a photorealistic clip. The driving performance contributes only motion and timing, not appearance.
The model is designed for short clips. Longer continuous shots require chunking, in which the user breaks a performance into multiple shorter passes and stitches the results together. Runway's help center recommends keeping individual dialogue passes under about 30 seconds and provides workflow notes on managing continuity between consecutive shots.
Act-Two's improvements over Act-One are concentrated in three areas: tracking coverage, motion fidelity, and consistency under demanding scene conditions.
| Capability | Act-One (October 2024) | Act-Two (July 2025) |
|---|---|---|
| Underlying base | Gen-3 Alpha and Turbo | Independent model on Runway platform |
| Face tracking | Yes | Yes, with improved micro-expression fidelity |
| Body tracking | Limited or implied | Full body, explicit |
| Hand tracking | None | Yes |
| Environmental motion | Static background by default | Automatic when reference is an image |
| Maximum clip length | Bounded by underlying Gen-3 model | Up to 10 seconds at 1080p per generation |
| Output resolution | 720p typical | 1080p at 16:9 |
| API access | Not directly exposed | Available from July 21, 2025 |
| Pricing | Bundled with Gen-3 usage | 5 credits per second, 15 credit minimum |
Act-One remains usable for face only workflows, particularly dialogue close ups where body work is not needed and the older facial model produces acceptable results at lower cost. For most new production work involving body motion or hand gestures, Act-Two has become the default.
Act-Two is available through the Runway web app and through the Runway developer API. The web app exposes Act-Two as a workflow on paid plans, including Standard, Pro, Unlimited, and Enterprise. The API was opened on July 21, 2025 and uses the same credit based billing as the rest of the Runway API surface.
Act-Two consumes 5 credits per second of generated video, with a 3 second minimum per request that translates to a 15 credit floor. Credits in the developer portal cost $0.01 each, so a single 3 second generation costs $0.15 and a 10 second generation costs $0.50. By the Runway API pricing reference, this places Act-Two at the same per second rate as Gen-4 Turbo and Gen-3 Alpha Turbo, and below Gen-4.5 at 12 credits per second, Aleph at 15 credits per second, and Veo 3 access through Runway at 40 credits per second.
| Model | Credits per second | Approximate API cost for 10 seconds |
|---|---|---|
| Act-Two | 5 | $0.50 |
| Gen-3 Alpha Turbo | 5 | $0.50 |
| Gen-4 Turbo | 5 | $0.50 |
| Gen-4.5 | 12 | $1.20 |
| Aleph | 15 | $1.50 |
| Veo 3 via Runway | 40 | $4.00 |
The subscription plans give users a monthly credit allowance that can be spent on Act-Two alongside other Runway models. Standard at $12 per user per month includes 625 credits, Pro at $28 per user per month includes 2,250 credits, and Unlimited at $76 per user per month includes 2,250 credits plus an unlimited explore mode with lower priority queueing. Enterprise pricing is custom and includes the early access window that Act-Two and other new models passed through at launch.
The API has the same input and output structure as other Runway endpoints. Developers submit a driving video URL and a character reference URL (image or video), along with optional parameters that control duration, resolution, and aspect ratio. The job completes asynchronously and returns a generated MP4 file when ready. The asynchronous pattern is similar to that used by Gen-4 and Aleph and supports queue management for batch workloads.
Act-Two arrived during a period of accelerating studio engagement with Runway. By July 2025 the company had signed custom model partnerships with Lionsgate (September 2024) and AMC Networks (June 2025), and had ongoing exploratory work with several other Hollywood and streaming companies. Lionsgate executives told The Hollywood Reporter that filmmakers across the studio's slate were already using Runway tooling for pre-visualization and post-production tasks.
The practical applications for Act-Two in production cluster around a handful of categories. Pre-visualization is the most obvious. Directors, production designers, and storyboard artists use Act-Two to mock up performance ideas before committing a scene to live action shoots or to traditional animation pipelines. A director can record a smartphone clip of themselves or a reference performer acting out a scene, attach a concept image of the character, and produce a draft of the shot in minutes rather than days. The result is precise enough to communicate intent to an animation team or a producer without spending the production budget that a finished shot would require.
Independent animation projects represent another visible use case. The Runway AI Film Festival, an annual showcase that Runway has run since 2023, received about 6,000 submissions in its 2025 edition. Several of the finalist films at the 2025 festival, held at Alice Tully Hall in Lincoln Center on June 5, 2025, used Runway tooling for performance driven sequences. After Act-Two's release later in July, no-Film-School and other industry publications described the model as a serious option for independent narrative work that needs character animation but cannot afford a full mocap stage.
Game cinematic and VFX teams have shown early interest as well. The full hand and body tracking in Act-Two opens up workflows for in engine cinematics where a director can drive an existing rigged character with a smartphone performance and use the result as a reference for keyframe animators, or in some pipelines as direct input. The output is video rather than skeletal animation data, which means Act-Two does not replace traditional mocap for engine integration. It does, however, replace the reference video that animators or mocap performers would otherwise need to capture in a shoot.
Advertising production is a fourth application. Brand teams use Act-Two to generate stylized character spots without scheduling actor shoots or commissioning full CGI work. The Coca-Cola Christmas campaigns of 2024 and 2025 relied on Runway tooling alongside other generative video models, with character driven sequences benefiting from the consistency between performance and target character that Act-Two and Gen-4 References provide together.
Industry coverage from VP-Land, No Film School, and Restart Reality treated the launch as a significant step in the broader story of generative AI in film and television. The Lionsgate angle was repeatedly raised: with Lionsgate's library now feeding a custom Runway model and Act-Two providing a generative motion capture tool, the studio had a usable end to end stack for low cost previz and concept work.
Act-Two competes on different axes against two different categories: traditional optical motion capture and other AI driven character animation systems.
Traditional motion capture, as practiced at studios such as Industrial Light and Magic, Weta FX, and dedicated mocap houses, uses a calibrated array of cameras, retro-reflective markers or markerless depth sensors, and post-processed skeletal solving to produce high precision motion data. The output is a stream of joint angles and position data that can drive a rigged character model in a 3D engine or animation package. Traditional mocap is precise, repeatable, and produces motion data that can be edited frame by frame. It is also expensive, requires a dedicated stage, demands trained performers and operators, and has a long iteration loop.
Act-Two takes the opposite trade. It accepts video from any consumer camera, returns generated video rather than skeletal data, and produces results in minutes. The output cannot be directly edited as motion data because it is pixel based, which means that downstream changes typically require regenerating the clip rather than tweaking individual joints. The motion fidelity is high enough for many uses but is not pixel for pixel equivalent to traditional mocap when reviewed at frame level by VFX teams.
| Attribute | Traditional optical mocap | Runway Act-Two |
|---|---|---|
| Capture hardware | Multi-camera stage, markers or depth sensors | Any consumer camera or smartphone |
| Output | Skeletal motion data | Rendered video |
| Iteration time | Hours to days | Minutes |
| Cost per minute | Hundreds to thousands of dollars | About $0.50 per 10 seconds via API |
| Editability after capture | Frame by frame on skeleton | Re-generate the clip |
| Precision | Sub-millimeter joint tracking | Approximate, perceptual |
| Typical use | Feature film VFX, AAA games | Previz, indie animation, concept |
Within the generative video category, Act-Two competes most directly with Hedra Character, Synthesia, and a handful of smaller services such as LivePortrait. The competitive picture is shaped by what each service is built to do rather than by raw benchmarks. Hedra's Character-3 model, released in early 2025, focuses on audio driven talking head animation from a single image, with strong lip sync accuracy across multiple languages. Synthesia is an enterprise platform for AI avatar video generation, optimized for corporate training, internal communication, and similar fixed format content, with extensive language support and a library of pre-built avatars.
| System | Primary input | Primary output | Best fit |
|---|---|---|---|
| Runway Act-Two | Driving performance video plus character reference | Generated video with full body, face, and hand transfer | Cinematic performance capture, previz, indie animation |
| Hedra Character | Image, text, and audio | Animated talking head with synchronized lip motion | Audio driven avatar dialogue, social content |
| Synthesia | Script plus avatar selection | Avatar driven corporate video | Training, internal communication, multi-language enterprise video |
| Live action mocap | Marker or markerless capture on a stage | Skeletal motion data | High precision VFX, AAA games |
For a film team that needs a character to deliver a performance with specific gestures and timing, Act-Two is the only system in this list that captures both face and body from a single driving performance and applies them to an arbitrary character. For an enterprise team that needs a generic spokesperson delivering a script in twenty languages, Synthesia remains the better fit. For an audio driven music video where the dialogue is the focus, Hedra is often the preferred tool. The three systems sit in adjacent rather than overlapping product categories most of the time, although the boundaries have begun to blur as each platform adds features the others have shipped.
Coverage of Act-Two at launch was largely positive. AlternativeTo described the release as Runway adding advanced motion capture and full body tracking to its lineup. No Film School framed it in terms of its potential to disrupt the mocap industry, observing that motion capture had historically been a substantial expense for film, advertising, and games. VP-Land called the release a meaningful update to Runway's character animation stack and singled out the body and hand tracking as the headline change relative to Act-One.
Film industry trade press emphasized the studio context. The Hollywood Reporter coverage of the Lionsgate partnership had already framed Runway as a Hollywood aligned vendor rather than a disruptor, and the Act-Two coverage extended that framing. Variety and Deadline both noted that the release fit a pattern: a steady cadence of feature drops between major model releases (Gen-4 in March 2025, Act-Two and Aleph in July 2025, Gen-4 Image Turbo in August 2025) designed to keep professional users on the platform while engaging studio decision makers.
The enthusiast and creator response on X and YouTube was more mixed. Many creators described Act-Two as the first AI tool that gave them usable body and hand work without a mocap stage. Others noted that the model struggled with specific edge cases: fast camera moves in the driving video, occluded hands, or driving performances where the framing did not match the target character's body proportions. Creators posted side by side comparisons showing clearer micro-expression transfer and better body posture in Act-Two, alongside cases where it introduced unwanted body motion on characters that should have remained largely static.
Criticism around AI in film production extended to Act-Two even though the model itself does not directly displace performers. The 2023 Hollywood strikes had centered in part on AI's role in production, and SAG-AFTRA agreements established disclosure requirements for digital replicas of named performers and bargaining requirements for synthetic performers. Critics raised concerns that tools like Act-Two could shift work away from mocap performers, motion editors, and traditional animators. Runway has consistently positioned its tools as augmenting rather than replacing human work.
On safety and consent, Runway's stated policy is to use automated systems and internal human review to detect content that violates its usage policy, including non-consensual likeness use. Runway also embeds invisible watermarks in generated content. Critics have argued that watermarking and moderation are necessary but not sufficient steps for a tool with the performance transfer capabilities of Act-Two.
Act-Two has several limitations that were apparent at launch and that Runway's help documentation and independent reviewers have catalogued.
Clip length is bounded. Each generation tops out at about 10 seconds at 1080p, with the help center describing dialogue passes typically kept under 30 seconds total. Longer continuous performances require chunking and stitching, which introduces continuity work at the seams between consecutive clips.
The driving performance must be reasonably well framed and lit. Strong occlusion of hands, extreme camera angles, or heavy motion blur in the driving video degrades the output. Multi character driving footage is supported through a dedicated dialogue workflow, but solo character work remains the default and is the most reliable.
The model produces video output rather than skeletal animation data. For pipelines that need editable motion on a 3D rig, Act-Two cannot directly substitute for traditional mocap. It can serve as a reference for animators or as a draft for keyframe work, but the rendered nature of the output is a structural limit.
Fast moving hands, complex finger interactions with objects, and tight close ups on hands remain weak points. The model has improved on Act-One in these categories, but artifacts persist, especially in the same broad categories (hand articulation, finger object contact) that affect every major commercial video model in 2025.
The output cannot reliably reproduce a specific performance frame for frame. Identical inputs produce different outputs across runs, which is inherent to diffusion based generation. For applications that require exact reproducibility, this matters; for previz, exploration, and most narrative work, it does not.
Runway has not published a detailed technical paper for Act-Two. Architecture, training data, and parameter count remain undisclosed. The company's public statements describe its models as transformer based diffusion systems trained on curated internal datasets. This opacity is consistent with Runway's stance on other recent models and reflects ongoing copyright litigation against several generative AI companies, including Runway, over training data sourcing.
Act-Two sits alongside Runway Gen-4 and Gen-4 Turbo (image to video), Aleph (in context video editing), and Gen-4 Image (still image generation with reference conditioning). Each model targets a different input type: text prompts, images, existing video, or a recorded performance. Many production pipelines combine multiple Runway models in a single shot.