Sora is a generative AI video model developed by OpenAI that creates videos from text prompts, images, and existing video clips. First previewed as a research demonstration on February 15, 2024, it launched publicly on December 9, 2024, as part of OpenAI's "12 Days of Shipmas" event. A major successor, Sora 2, followed on September 30, 2025, with native audio generation and a dedicated social iOS app. The model uses a diffusion transformer architecture that operates on "spacetime patches" of video latent codes, enabling it to generate videos of varying resolutions, durations, and aspect ratios.
On February 15, 2024, OpenAI published a technical report titled "Video Generation Models as World Simulators" alongside several sample clips. These included an SUV driving down a mountain road, an animated "short fluffy monster" standing next to a candle, two people walking through snowy Tokyo, and synthetic historical footage of the California gold rush [1]. At this stage, Sora was not available to the general public. OpenAI granted access to a small group of red teamers and visual artists in over 60 countries to test the system, identify safety weaknesses, and provide creative feedback [2].
The February 2024 preview was widely compared to a "GPT-1 moment" for video, marking the first time that behaviors like object permanence seemed to emerge naturally from scaling up pre-training compute for video generation [3].
OpenAI released Sora to the public on December 9, 2024, during its "12 Days of OpenAI" (informally called "12 Days of Shipmas") event, a 12-day live-stream series that ran from December 5 onward, unveiling a new product or feature each weekday [4]. The version launched publicly was called Sora Turbo, a significantly faster and more capable iteration of the model shown in February.
Sora Turbo brought several improvements over the research preview:
At launch, Sora was available to ChatGPT Plus and Pro subscribers in most regions where ChatGPT operates. The United Kingdom, Switzerland, and the European Economic Area were excluded from the initial rollout [6].
OpenAI announced Sora 2 on September 30, 2025, alongside a dedicated iOS app and plans for an Android version (which arrived about two months later) [3]. Sora 2 represented a large step forward in several areas:
The Sora app also functions as a social platform, allowing users to create content, remix others' generations, and browse a customizable video feed [3].
As of March 13, 2026, Sora 1 is no longer available in the United States; the app opens in Sora 2 by default for US users [8]. Starting January 10, 2026, OpenAI removed free-tier access to video and image generation in Sora, restricting it to Plus and Pro subscribers only [9].
The standalone Sora app has experienced declining engagement after its initial launch excitement. In January 2026, app installs fell 45% month-over-month to 1.2 million, and consumer spending dropped 32% over the same period. On the US App Store, Sora fell out of the Top 100 free apps [10].
In response, OpenAI has signaled plans to integrate Sora's video generation capabilities directly into ChatGPT, similar to how DALL-E image generation was previously embedded in the chat interface. The Information reported this plan on March 11, 2026, noting that the move aims to reach a broader user base and push toward OpenAI's goal of 1 billion weekly active users [11]. Under this integration, a user could ask ChatGPT to write a script and then immediately generate a video trailer based on the output, all within the same conversation. The standalone Sora app is expected to continue operating alongside the ChatGPT integration. However, the computational cost of video generation is significantly higher than text generation, and industry observers have noted that this integration could increase OpenAI's operating expenses [11].
Several incremental features were added throughout late 2025 and early 2026:
On December 11, 2025, The Walt Disney Company announced a landmark $1 billion equity investment in OpenAI, accompanied by a three-year licensing agreement that made Disney the first major content licensing partner on the Sora platform [23]. Under the deal, Sora users gained access to more than 200 animated, masked, and creature characters from Disney, Pixar, Marvel, and Star Wars, including costumes, props, vehicles, and iconic environments. Available characters include Mickey Mouse, Minnie Mouse, Lilo, Stitch, Ariel, Belle, Cinderella, Black Panther, Darth Vader, and Yoda, among many others [24].
Beyond the Sora licensing, Disney also became a major enterprise customer of OpenAI, using its APIs to build internal tools and experiences for Disney+, and deploying ChatGPT for its employees. A selection of fan-inspired Sora short-form videos became available to stream on Disney+. The character licensing on Sora and ChatGPT Images was expected to go live in early 2026 [23][24].
Sora is a diffusion model built on a transformer backbone, a design sometimes called a "Diffusion Transformer" or "DiT." The core pipeline consists of three stages: a video compressor (encoder), the transformer-based denoiser, and a video decompressor (decoder) [12].
Sora uses a spatiotemporal autoencoder trained from scratch to compress raw video into a lower-dimensional latent space. This compression reduces both spatial resolution and temporal length, meaning a one-minute video becomes a much shorter sequence of latent frames. The compression step is what enables Sora to handle long-duration video generation without an unmanageable number of tokens [1].
The compressed video is then decomposed into "spacetime patches," three-dimensional chunks that span portions of both the spatial frame and the temporal sequence. These patches serve as the equivalent of tokens in a large language model: the transformer processes them as a sequence. Because patches can be extracted from videos of any resolution, duration, or aspect ratio, the same architecture handles a wide variety of input and output formats without requiring fixed dimensions [1][12].
OpenAI's technical report draws an explicit analogy: just as text tokens represent word fragments that can be assembled into any sentence, spacetime patches represent "visual phrases" that can be assembled into any video [1].
Video generation begins with a latent representation filled with random noise. Over many denoising steps, the transformer predicts and removes noise to reveal the final video. The model is trained to predict the original "clean" patches from their noisy versions, conditioned on a text prompt that has been processed by a text encoder (similar to those used in DALL-E 3). The result is decoded back into pixel space by the video decompressor [12].
According to OpenAI's system card, Sora was trained on a combination of three data sources [2]:
| Data Source | Description |
|---|---|
| Publicly available data | Collected from industry-standard machine learning datasets and web crawls |
| Proprietary partnership data | Licensed content from partners such as Shutterstock and Pond5 |
| Human feedback data | Input from AI trainers, red teamers, and employees |
Before training, all datasets go through a filtering process that removes explicit, violent, or otherwise sensitive material, extending the filtering methods developed for DALL-E 2 and DALL-E 3 [2].
Sora's capabilities have expanded across its versions. The table below compares the original Sora (Turbo) release and Sora 2.
| Feature | Sora Turbo (Dec 2024) | Sora 2 (Sep 2025) |
|---|---|---|
| Maximum resolution | 1080p | 1080p |
| Maximum duration | 20 seconds | 25 seconds |
| Aspect ratios | Widescreen, vertical, square | Widescreen, vertical, square |
| Audio generation | No | Yes (dialogue, SFX, ambient) |
| Characters feature | No | Yes |
| Input types | Text, image, video | Text, image, video |
| Storyboard tool | Yes | Yes |
| Social feed | Yes | Yes (expanded) |
| Video extensions | No | Yes |
| Licensed character library | No | Yes (Disney, Pixar, Marvel, Star Wars) |
Beyond raw specifications, Sora demonstrates several emergent properties [1]:
Despite its capabilities, Sora has known shortcomings that OpenAI has publicly acknowledged.
The original Sora model frequently failed to simulate complex physical dynamics correctly. A cookie might show no bite mark after a character takes a bite; a glass might not shatter when dropped [1]. While Sora 2 improved physics accuracy, errors can still occur in scenes involving multiple interacting forces. Users have found that combining several physical actions in a single prompt (such as pouring water while stirring a spoon) increases the likelihood of artifacts [13].
Sora sometimes confuses spatial directions, mixing up left and right or failing to follow precise positional descriptions in the prompt. This limitation affects tasks requiring exact object placement or character orientation within a scene [1].
Early versions of Sora produced particularly poor results for gymnastics, generating "strange shape-shifting humans that vault through the air and sometimes land on three legs or an extra head" [14]. While Sora 2 specifically highlights improved gymnastics rendering as a benchmark, complex human motion remains an area where errors can appear.
Maintaining narrative and visual consistency across longer video durations is challenging. While Sora 2 improved multi-shot controllability, subtle inconsistencies in character appearance, clothing, or background details can still emerge over extended sequences.
Sora-generated videos include a visible, moving digital watermark to signal AI-generated content. However, within a week of Sora 2's release, third-party programs appeared that could remove the watermark, undermining this safety measure [25].
Sora is bundled with OpenAI's ChatGPT subscription tiers rather than sold as a standalone product. The pricing structure has evolved over time. As of early 2026, the tiers are as follows [9][15]:
| Feature | ChatGPT Plus ($20/month) | ChatGPT Pro ($200/month) |
|---|---|---|
| Monthly credits | 1,000 | 10,000 |
| Priority videos (approx.) | ~50 | ~500 |
| Maximum resolution | 720p | 1080p |
| Maximum video length | 5 seconds | 20 seconds |
| Watermarks | Yes | No |
| Relaxed mode (unlimited) | No | Yes |
Pro subscribers also have access to an unlimited "relaxed" generation mode, where videos are queued at lower priority and processed during off-peak hours at no credit cost [15].
Free-tier users lost access to Sora's generation features on January 10, 2026 [9].
For developers, OpenAI offers API access to Sora 2 with pricing based on model tier and output resolution [16]:
| Model | Resolution | Price per Second |
|---|---|---|
| Sora 2 | 720p | $0.10 |
| Sora 2 Pro | 720p | $0.30 |
| Sora 2 Pro | 1024p (1792x1024) | $0.50 |
A 10-second standard video at 720p costs roughly $1.00, while a 10-second Pro HD clip runs approximately $5.00. Developers need at minimum a $10 API credit top-up (Tier 2) to unlock Sora model access. Rate limits scale with tier: Plus subscribers get 5 requests per minute, Pro users get 50 requests per minute, and Enterprise accounts can negotiate 200 or more requests per minute with dedicated support [16].
The Sora API supports several endpoints, including reusable character references, video extensions, generations up to 20 seconds, 1080p output for the sora-2-pro model, and Batch API support. OpenAI also added a POST /v1/videos/edits endpoint for editing existing videos [22].
OpenAI's safety approach for Sora builds on methods developed for DALL-E and ChatGPT [2].
Before the December 2024 launch, OpenAI engaged external red teamers in nine countries to probe the system for vulnerabilities and safety gaps. The company also worked with hundreds of visual artists, designers, and filmmakers from over 60 countries after the February 2024 announcement [2].
OpenAI built an internal search tool that uses technical attributes of generated videos to help verify whether a piece of content came from Sora. This assists in tracking misuse and responding to reports of harmful content [5].
Despite these safeguards, security researchers have found weaknesses. Reality Defender, a company specializing in identifying deepfakes, reported that it was able to bypass Sora's anti-impersonation safeguards within 24 hours of the Sora 2 launch. A Washington Post journalist demonstrated that the face-sharing feature could be exploited: simply granting the app permission to share one's face with chosen contacts allowed those contacts to create videos of the person being arrested or engaging in fabricated scenarios, all without further approval [26].
In November 2024, shortly before the public launch, a group of artists who had been granted early access to Sora leaked access to the model on Hugging Face. They published a manifesto accusing OpenAI of "art washing," claiming that the company used them as "PR puppets" to lend artistic credibility to a product that they believed threatened their livelihoods, all without compensation [18].
Sora's training data has drawn legal scrutiny. A coalition of Japanese entertainment companies, including Studio Ghibli, Bandai Namco, and Square Enix, accused OpenAI of using copyrighted animation and design styles without permission. Japan's Content Overseas Distribution Association argued that OpenAI's "opt-out" system for rights holders improperly reverses the burden of consent, urging the company to stop using Japanese works until a legal framework is in place [17].
On the user-generated content side, the Sora app initially faced problems with users creating videos featuring copyrighted characters like SpongeBob and Pikachu. OpenAI shifted from an opt-out to an opt-in model for intellectual property and increased content restrictions, though this change contributed to declining user engagement [10].
The announcement of Sora prompted a strong response from parts of the entertainment industry. Filmmaker Tyler Perry announced he would pause a planned $800 million expansion of his Atlanta studio, citing concerns about the potential impact of AI video generation tools like Sora on traditional filmmaking [26]. Major talent agencies also took protective action: Creative Artists Agency and United Talent Agency opted their clients out of Sora 2. United Talent Agency described the app as "exploitation, not innovation," while Creative Artists Agency warned that it "exposes our clients and their intellectual property to significant risk" [26].
The Sora 2 launch in September 2025 immediately triggered deepfake concerns. Unauthorized AI-generated clips using actor Bryan Cranston's voice and likeness appeared on the platform. Under pressure from Cranston and the SAG-AFTRA actors' union, OpenAI updated its policy to require opt-in consent before any person's likeness can be used [17]. Families of Robin Williams, George Carlin, and Martin Luther King Jr. also complained to OpenAI about the misuse of their loved ones' likenesses on the platform [26].
Public Citizen, a US consumer advocacy group, called on OpenAI to suspend Sora 2 in November 2025, warning that its realistic video output could be weaponized for political deepfakes or non-consensual imagery [19]. A separate controversy emerged around a "dead celebrity loophole": since posthumous likeness rights vary widely across jurisdictions, families of deceased public figures have limited legal recourse. OpenAI blocked videos of Martin Luther King Jr. on the platform after users created what the company called "disrespectful depictions" [17].
Broader concerns about disinformation emerged rapidly after the Sora 2 launch. The Sora app saw AI-generated videos depicting ballot fraud, immigration arrests, protests, and fabricated crime scenes appear on its social feed within days of release [26]. UC Berkeley's School of Information warned that society is "unprepared for the next wave of increasingly realistic, personalized deepfakes" [26].
Sora operates in a rapidly expanding market for AI video generation. The major competitors as of early 2026 include:
| Model | Developer | Key Features |
|---|---|---|
| Veo 3 / 3.1 | Google (DeepMind) | Native 4K output, character consistency, vertical video, native audio; Ingredients to Video for object consistency, Frames to Video for transitions, Insert/Remove Object with automatic lighting; available through Gemini Advanced at $19.99/month |
| Movie Gen | Meta | 30B parameter model, 16-second videos at 1080p, personalized video from a single photo, synchronized audio up to 45 seconds; announced October 2024 |
| Runway Gen-4.5 | Runway | Ranked #1 on Artificial Analysis for visual quality, cinematic focus, realistic physics, widely used in professional post-production; Lionsgate partnership; no native audio as of early 2026; $95/month Unlimited tier |
| Pika 2.1 Turbo | Pika Labs | Fast generation (30-90 seconds, 3-6x faster than competitors), creative effects and style transfer tools, social-oriented; $28/month Pro tier |
| Kling 2.6 | Kuaishou | Ranked #1 on Artificial Analysis for image-to-video, simultaneous audio-visual generation, strong at complex movements and dynamic camera work (released December 2025) |
| Hailuo 02 / 2.3 | MiniMax | 1080p at 24-30 FPS, ranked second globally on Artificial Analysis benchmark as of early 2026, NCR architecture |
Google's Veo 3.1 is often cited as the current market leader for overall quality and accessibility, scoring higher than Sora 2 on prompt adherence and audio. Sora 2 holds advantages in human emotion rendering and physics simulation. Runway Gen-4.5 dominates in professional filmmaking circles despite lacking native audio generation. Kling and Hailuo have gained strong followings in Asia and are competitive on global benchmarks [20][21].
The name "Sora" comes from the Japanese word meaning "sky," which OpenAI chose to evoke the model's limitless creative potential [1].