Grok Imagine
Last reviewed
May 16, 2026
Sources
22 citations
Review status
Source-backed
Revision
v1 ยท 3,515 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
22 citations
Review status
Source-backed
Revision
v1 ยท 3,515 words
Add missing citations, update stale details, or suggest a clearer explanation.
Grok Imagine is a generative media product from xAI, the company founded by Elon Musk. It produces still images, short videos with synchronized audio, and image edits from natural language prompts or reference pictures. The tool grew out of the image generation features added to Grok in late 2024 and reached a tagged 1.0 release on February 2, 2026, after a public rollout that began in August 2025. By that point xAI said the product was generating roughly 1.245 billion videos every 30 days, putting it among the most heavily used consumer video models on the market.
Grok Imagine combines a text-to-image stack derived from the Aurora (xAI) model with a separate video generation pipeline. The system runs inside the Grok mobile and web apps and, since January 28, 2026, is also exposed through the xAI Imagine API. The product is positioned against Sora 2 from OpenAI, Veo 3 from Google DeepMind, and Google's lightweight Nano Banana editor, with xAI emphasizing low latency, looser content rules, and per-second pricing well below comparable services.
The rollout has been overshadowed by safety controversies. A preset called "spicy mode," enabled by default in early versions, generated non-consensual nude videos of celebrities including Taylor Swift on the day Grok Imagine became available to subscribers. Reporters at outlets like The Verge, TechCrunch, and Common Dreams documented the issue within hours of launch, and regulators in the United Kingdom and France later opened inquiries into whether the tool produced illegal sexual content involving minors. Despite the criticism, xAI has continued to expand the feature set and the underlying API.
xAI was incorporated in March 2023 and shipped its first chatbot, Grok, on the X platform in November of that year. The earliest versions of Grok handled only text. Image generation arrived in August 2024 through a partnership with Black Forest Labs, which let users call the FLUX.1 model directly from inside Grok's chat interface. That arrangement was always treated as a placeholder. Musk had said publicly that xAI wanted to control its own image stack, and the company began training a proprietary diffusion-style model called Aurora (xAI) during the second half of 2024.
Aurora went live on December 9, 2024, replacing FLUX inside Grok on the web and on X. xAI described Aurora as an autoregressive mixture-of-experts network trained to predict the next token from interleaved text and image data, rather than a conventional latent diffusion model. The model focused on photorealism, accurate rendering of small text, faithful reproduction of logos, and recognizable celebrity likenesses. It was a deliberately permissive system. Aurora allowed images of public figures and copyrighted characters that competing tools refused to produce, and the only hard limit at launch was a refusal to generate full nudity. The model reached the xAI public API on March 21, 2025.
The shift from a still-image feature to a full creative suite happened in mid-2025, when xAI began testing video generation. Musk previewed the work on X in July, calling it "an AI Vine" in a nod to the defunct six-second video service. The product was rebranded "Grok Imagine" and shipped to subscribers on August 4, 2025. From that point the term refers to the combined image, video, and editing surface rather than the underlying Aurora model alone.
| Date | Release | Notes |
|---|---|---|
| December 9, 2024 | Aurora image model | xAI's first in-house text-to-image model. Replaced FLUX.1 inside Grok. |
| March 21, 2025 | Aurora on API | Aurora exposed to developers through the xAI REST API. |
| July 28, 2025 | Grok Imagine preview | Limited test of a six-second "AI Vine" style video tool described by Elon Musk on X. |
| August 4, 2025 | Grok Imagine public launch | Available to SuperGrok and Premium+ subscribers on iOS. Produced 15-second clips with native audio and included "spicy mode" preset. |
| October 5, 2025 | Grok Imagine v0.9 | Generation latency cut to under 15 seconds per clip. Added voice-first prompting and instant still image generation. |
| January 28, 2026 | Grok Imagine API | Public API release with two model IDs: grok-imagine-image-quality and grok-imagine-video. |
| February 2, 2026 | Grok Imagine 1.0 | Tagged 1.0 release. Increased clip length to 10 seconds at 720p with substantially better audio. xAI claimed 1.245 billion videos generated in the prior 30 days. |
| March 25, 2026 | SuperGrok Lite tier | Introduced a $10 per month consumer tier with Grok Imagine access and one AI agent. |
The February 2026 "1.0" label was, to a degree, a marketing choice. The product had been live for six months and had accumulated three sets of major capability changes before xAI was willing to drop the version prefix. Internally the company has continued to ship smaller updates roughly every two weeks, and the docs page lists the API as the canonical source for current limits.
Grok Imagine is organized around four core actions: generate a still image, edit an existing image, generate a short video from text, and animate or edit a video starting from an image or earlier clip. All four are reachable from the same prompt box in the consumer app, and the API exposes them through two model endpoints.
| Capability | Inputs | Output | Notes (as of February 2026 1.0 release) |
|---|---|---|---|
| Text-to-image | Text prompt, aspect ratio, optional resolution | Up to 10 stills per request | Two quality tiers, including a faster standard model and a slower "pro" model with finer detail. |
| Image editing | One to three reference images plus instruction | Edited still | Multi-image conditioning supports compositing, style transfer, and identity preservation. |
| Text-to-video | Text prompt, optional duration, aspect ratio | Up to 10 seconds at 720p with audio | Audio includes ambient sound and best-effort dialogue. |
| Image-to-video | One still plus optional motion description | Animated clip with audio | Useful for turning a generated frame into a moving shot. |
| Video editing | Existing clip plus instruction | Edited or extended clip | Supports trimming, restyling, and continuation beyond the original cut. |
| Voice prompting | Spoken instruction in the mobile app | Same as above | Added in the October 2025 v0.9 release. |
Audio quality has been the most-tracked metric across versions. The August 2025 launch promised "native audio" but produced muddy ambient tracks and unconvincing lip sync. The 1.0 release added what xAI called "smoother motion" and clearer dialogue, though reviewers at Tom's Guide and Latent Space still found the lip sync trailing Veo 3 by a noticeable margin. Image quality, by contrast, has been competitive with leading models since the Aurora launch, particularly on photorealistic portraits and on small details like signage and product packaging.
Grok Imagine does not support 1080p or 4K video output. The resolution cap at 720p was the headline limitation that critics pointed to in early 2026 comparisons, and xAI has not published a public commitment to lifting it.
The still-image side of Grok Imagine runs on Aurora, which xAI describes as an autoregressive mixture-of-experts network. Rather than denoising a latent representation in the way most diffusion models do, Aurora predicts the next token in an interleaved sequence that contains both text and image patches. The architecture is closer in spirit to multimodal language models like the original GPT-4o image system than to Stable Diffusion or FLUX. xAI has not published the parameter count or the exact training data mix, though it has confirmed that the model was trained from scratch on "billions of examples from the internet."
The video pipeline is a separate model that takes the still output from Aurora and extends it forward in time. xAI has not given the video model a distinct public name, referring to it inside the API only as grok-imagine-video. According to a Latent Space post that drew on a developer briefing, the video model was trained on a cluster built around 110,000 NVIDIA GB200 GPUs, which would make it one of the largest single-purpose training jobs publicly disclosed for a video model. That figure has not been independently confirmed by xAI's own publications. Treat it as the company's own claim rather than a verified count.
Audio generation runs as a third stage. After a clip's visual frames are produced, the audio model conditions on the visual content and the original text prompt to generate matched ambient sound, music, and dialogue. The result is muxed with the video before delivery. xAI has not disclosed whether the audio model is a fine-tune of the speech systems that run inside the Grok voice assistant or a separate model entirely.
The Grok Imagine API opened on January 28, 2026 and is documented at docs.x.ai. It uses asynchronous polling. A client submits a request, receives a job identifier, and then polls for completion. Input images can be supplied as public URLs or as base64 data URIs. The API ships SDKs for Python through both the xAI client and the OpenAI compatibility shim, JavaScript through the Vercel AI SDK, and plain REST through curl.
| Endpoint | Function | List price (per call) |
|---|---|---|
| grok-imagine-image-quality (standard) | Still image generation and edits | $0.02 per output image |
| grok-imagine-image-quality (pro) | Higher quality still image generation | $0.07 per output image |
| grok-imagine-video | Text-to-video and image-to-video with audio | $0.07 per second, equivalent to $4.20 per minute of finished video |
The $4.20 per minute headline is roughly one third of Google's Veo 3.1 Preview, which lists at $12 per minute with audio, and around one seventh of OpenAI Sora 2 Pro at $30 per minute. The $4.20 number is also unmistakably an Elon Musk price point; he has used 420 in product launches before, including the brief 2018 attempt to take Tesla private at $420 per share. Whether you find that funny or tiresome probably depends on how long you have been paying attention to the company.
Consumer access to Grok Imagine is bundled into Grok's subscription tiers. The current lineup runs from a free tier with hard rate limits, through SuperGrok Lite at $10 per month, SuperGrok at $30 per month or $300 per year, up to SuperGrok Heavy at $300 per month. The free tier permits roughly 10 image generations every two hours. SuperGrok unlocks unlimited image generation and around 100 video renders per day. SuperGrok Heavy raises those limits substantially and is the only tier with full access to Grok 4 Heavy reasoning alongside Imagine. Premium+ subscribers on the X platform get a more limited Grok Imagine quota as a bundled feature.
For enterprise customers, xAI sells the API on standard pay-as-you-go terms with no minimum commit. Volume discounts are negotiated directly. The company has not published an SLA for video generation latency, which currently runs between roughly 15 seconds and 90 seconds depending on clip length and queue depth.
Grok Imagine launched on August 4, 2025 with four content presets: fun, normal, custom, and "spicy." The spicy preset was meant to allow suggestive or sexually charged content within xAI's posted policy, which still formally banned "depicting likenesses of persons in a pornographic manner." The reality on launch day was very different. Jess Weatherbed of The Verge, working on a story about the new product, wrote that spicy mode produced topless videos of Taylor Swift the first time she tried it, without any nudity-related wording in the prompt. Common Dreams, the Penn State Digital Shred blog, and the AI Incident Database documented similar bypasses against other female celebrities including Scarlett Johansson.
Musk's response was muted. xAI updated the content filters to add stronger refusals for named celebrities, but reviewers continued to find ways around them for several weeks. Haley McNamara of the National Center on Sexual Exploitation said in a statement that "xAI appears to be doubling down on furthering sexual exploitation by enabling AI videos to create nudity." The Hollywood Reporter and Deadline ran pieces speculating about lawsuits, though as of May 2026 no celebrity has filed a public complaint against xAI over Imagine output specifically.
In January 2026, Euronews reported that researchers had used Grok Imagine to produce sexually explicit imagery of women and what appeared to be minors. The investigation prompted the United Kingdom's Online Safety regulator and France's Arcom to open formal inquiries into whether Grok Imagine was producing illegal content under each country's child sexual abuse material rules. The story landed days before the 1.0 release and turned what would have been a victory lap into a more defensive launch, with xAI publishing an updated content policy alongside the version bump.
xAI has said it scans output for child sexual abuse material using industry-standard hashing and that it bans the relevant prompt patterns. Independent researchers at the Stanford Internet Observatory and the Center for Countering Digital Hate have disputed how aggressive those filters actually are in practice. The dispute is ongoing.
The deeper question across all of these stories is whether xAI's stated philosophy of "maximally truth-seeking, minimally censored" output is compatible with the kind of harm reduction other model providers attempt. Musk has been explicit, on X and in interviews, that he sees most existing AI content filters as politically biased. Grok Imagine sits at the sharp end of that view. The product is more permissive than Sora 2 or Veo 3 by design, and the rollout has been a long, public experiment in what happens when you ship that posture to a mass audience. The cost so far has been measured in regulatory inquiries and reputational damage rather than in product revenue, which has continued to grow.
The AI video generation market in early 2026 sits roughly in three tiers. Sora 2 and Veo 3 are the high quality, expensive, more cautious options. Grok Imagine, Bytedance's Seedance, Alibaba's Wan 2.5, and a handful of others target the middle on price and creative latitude. Nano Banana is a separate category entirely, focused on lightweight edits and stylized clips rather than full text-to-video.
| Product | Vendor | Max video length | Max resolution | Audio | List price (with audio) | Notable content policy |
|---|---|---|---|---|---|---|
| Grok Imagine 1.0 | xAI | 10 s API, 15 s app | 720p | Yes | $4.20 per minute | Permissive. Spicy mode for suggestive content. |
| Sora 2 Pro | OpenAI | 20 s | 1080p | Yes, including dialogue | $30 per minute | Strict. Refuses celebrity likenesses by default. |
| Veo 3.1 | Google DeepMind | 8 s | 1080p | Yes, including dialogue | $12 per minute | Strict, with watermarking via SynthID. |
| Nano Banana | 6 s | 720p | Limited | Free tier dominant | Light editing focus, not a full T2V tool. |
Reviewers tend to give Sora 2 the edge on physical realism and Veo 3 the edge on cinematic camera moves and synchronized lip sync. Grok Imagine wins on price, on raw throughput, and on willingness to render almost any subject the user asks for. Tom's Guide ran a seven-prompt comparison in November 2025 and gave Sora 2 the overall win, though Grok Imagine produced two clips the reviewer preferred and did so at roughly a tenth of the cost. The Latent Space writeup in February 2026 went further and called Grok Imagine "the #1 video model on cost and latency," while acknowledging that on best-of-three quality it still trailed the leaders.
The comparison with Nano Banana is more lopsided in xAI's favor, because Nano Banana is not really a competing product. Google positioned Nano Banana as a fast, conversational image editor sitting inside Gemini, with short video generation as a secondary feature. Grok Imagine targets the full pipeline. Where the two overlap is in single-shot meme-style clips, and there Nano Banana's tighter integration with the Gemini conversational model gives it an edge in iteration speed for users who already live inside Google's stack.
Reception of Grok Imagine has been split along familiar lines. Developers and power users on X have been broadly positive, citing the price, the API ergonomics, and the absence of the kind of refusals that have frustrated heavy users of competing services. The Latent Space and Hacker News threads around the API launch were dominated by people running the same prompts on Grok Imagine, Sora 2, and Veo 3 and posting the results. Most of those side-by-side comparisons concluded that Grok Imagine was usable for a wider range of subjects than either competitor.
Mainstream press coverage has been harsher. The Verge, TechCrunch, Common Dreams, Deadline, and Euronews have all run multiple critical pieces, and the framing in those stories has settled into a template. The reviewer tries the product, finds an objectionable output within minutes, and writes the story around that finding. xAI has not run a sustained public response to the press cycle beyond Musk's own posts on X, which generally either ignore the criticism or frame it as legacy media bias.
The usage numbers suggest that neither the controversies nor the resolution cap have meaningfully slowed adoption. The 1.245 billion videos in 30 days figure from February 2026 is roughly an order of magnitude above what Runway, Pika, and Luma have publicly disclosed for the same period. Some of that volume is driven by the bundled access for Premium+ subscribers on X, and the per-user generation count has not been broken out, so the comparison should be read with caution. Even discounting for that, Grok Imagine is on track to be the highest volume consumer video generation product in the market, at least until OpenAI ships the consumer build of its Sora app to a broader audience.
Whether that volume converts to durable revenue is the open question. xAI has not disclosed revenue from the Imagine product line separately, and the company's overall financials remain private. The most credible read at the moment is that Imagine is a strategic loss leader designed to anchor users to the Grok subscription bundle and to give the X platform a creative tool that none of its competitors can match for free or near-free output. The pricing on the API supports that read, and so does the speed at which xAI has shipped features like SuperGrok Lite that lower the entry barrier even further.