Grok Imagine

Grok Imagine is a generative media product from xAI, the company founded by Elon Musk. It produces still images, short videos with synchronized audio, and image edits from natural language prompts or reference pictures. The tool grew out of the image generation features added to Grok in late 2024 and reached a tagged 1.0 release on February 2, 2026, after a public rollout that began in August 2025. By that point xAI said the product was generating roughly 1.245 billion videos every 30 days, putting it among the most heavily used consumer video models on the market.

Grok Imagine combines a text-to-image stack derived from the Aurora (xAI) model with a separate video generation pipeline. The system runs inside the Grok mobile and web apps and, since January 28, 2026, is also exposed through the xAI Imagine API. The product is positioned against Sora 2 from OpenAI, Veo 3 from Google DeepMind, and Google's lightweight Nano Banana editor, with xAI emphasizing low latency, looser content rules, and per-second pricing well below comparable services.

The rollout has been overshadowed by safety controversies. A preset called "spicy mode," enabled by default in early versions, generated non-consensual nude videos of celebrities including Taylor Swift on the day Grok Imagine became available to subscribers. Reporters at outlets like The Verge, TechCrunch, and Common Dreams documented the issue within hours of launch, and regulators in the United Kingdom and France later opened inquiries into whether the tool produced illegal sexual content involving minors. Despite the criticism, xAI has continued to expand the feature set and the underlying API.

Background

xAI was incorporated in March 2023 and shipped its first chatbot, Grok, on the X platform in November of that year. The earliest versions of Grok handled only text. Image generation arrived in August 2024 through a partnership with Black Forest Labs, which let users call the FLUX.1 model directly from inside Grok's chat interface. That arrangement was always treated as a placeholder. Musk had said publicly that xAI wanted to control its own image stack, and the company began training a proprietary diffusion-style model called Aurora (xAI) during the second half of 2024.

Aurora went live on December 9, 2024, replacing FLUX inside Grok on the web and on X. xAI described Aurora as an autoregressive mixture-of-experts network trained to predict the next token from interleaved text and image data, rather than a conventional latent diffusion model. The model focused on photorealism, accurate rendering of small text, faithful reproduction of logos, and recognizable celebrity likenesses. It was a deliberately permissive system. Aurora allowed images of public figures and copyrighted characters that competing tools refused to produce, and the only hard limit at launch was a refusal to generate full nudity. The model reached the xAI public API on March 21, 2025.

The shift from a still-image feature to a full creative suite happened in mid-2025, when xAI began testing video generation. Musk previewed the work on X in July, calling it "an AI Vine" in a nod to the defunct six-second video service. The product was rebranded "Grok Imagine" and shipped to subscribers on August 4, 2025. From that point the term refers to the combined image, video, and editing surface rather than the underlying Aurora model alone.

Version timeline

Date	Release	Notes
December 9, 2024	Aurora image model	xAI's first in-house text-to-image model. Replaced FLUX.1 inside Grok.
March 21, 2025	Aurora on API	Aurora exposed to developers through the xAI REST API.
July 28, 2025	Grok Imagine preview	Limited test of a six-second "AI Vine" style video tool described by Elon Musk on X.
August 4, 2025	Grok Imagine public launch	Available to SuperGrok and Premium+ subscribers on iOS. Produced 15-second clips with native audio and included "spicy mode" preset.
October 5, 2025	Grok Imagine v0.9	Generation latency cut to under 15 seconds per clip. Added voice-first prompting and instant still image generation.
January 28, 2026	Grok Imagine API	Public API release with two model IDs: grok-imagine-image-quality and grok-imagine-video.
February 2, 2026	Grok Imagine 1.0	Tagged 1.0 release. Increased clip length to 10 seconds at 720p with substantially better audio. xAI claimed 1.245 billion videos generated in the prior 30 days.
March 25, 2026	SuperGrok Lite tier	Introduced a $10 per month consumer tier with Grok Imagine access and one AI agent.

The February 2026 "1.0" label was, to a degree, a marketing choice. The product had been live for six months and had accumulated three sets of major capability changes before xAI was willing to drop the version prefix. Internally the company has continued to ship smaller updates roughly every two weeks, and the docs page lists the API as the canonical source for current limits.

Capabilities

Grok Imagine is organized around four core actions: generate a still image, edit an existing image, generate a short video from text, and animate or edit a video starting from an image or earlier clip. All four are reachable from the same prompt box in the consumer app, and the API exposes them through two model endpoints.

Capability	Inputs	Output	Notes (as of February 2026 1.0 release)
Text-to-image	Text prompt, aspect ratio, optional resolution	Up to 10 stills per request	Two quality tiers, including a faster standard model and a slower "pro" model with finer detail.
Image editing	One to three reference images plus instruction	Edited still	Multi-image conditioning supports compositing, style transfer, and identity preservation.
Text-to-video	Text prompt, optional duration, aspect ratio	Up to 10 seconds at 720p with audio	Audio includes ambient sound and best-effort dialogue.
Image-to-video	One still plus optional motion description	Animated clip with audio	Useful for turning a generated frame into a moving shot.
Video editing	Existing clip plus instruction	Edited or extended clip	Supports trimming, restyling, and continuation beyond the original cut.
Voice prompting	Spoken instruction in the mobile app	Same as above	Added in the October 2025 v0.9 release.

Audio quality has been the most-tracked metric across versions. The August 2025 launch promised "native audio" but produced muddy ambient tracks and unconvincing lip sync. The 1.0 release added what xAI called "smoother motion" and clearer dialogue, though reviewers at Tom's Guide and Latent Space still found the lip sync trailing Veo 3 by a noticeable margin. Image quality, by contrast, has been competitive with leading models since the Aurora launch, particularly on photorealistic portraits and on small details like signage and product packaging.

Grok Imagine does not support 1080p or 4K video output. The resolution cap at 720p was the headline limitation that critics pointed to in early 2026 comparisons, and xAI has not published a public commitment to lifting it.

Aurora architecture and engine

The still-image side of Grok Imagine runs on Aurora, which xAI describes as an autoregressive mixture-of-experts network. Rather than denoising a latent representation in the way most diffusion models do, Aurora predicts the next token in an interleaved sequence that contains both text and image patches. The architecture is closer in spirit to multimodal language models like the original GPT-4o image system than to Stable Diffusion or FLUX. xAI has not published the parameter count or the exact training data mix, though it has confirmed that the model was trained from scratch on "billions of examples from the internet."

The video pipeline is a separate model that takes the still output from Aurora and extends it forward in time. xAI has not given the video model a distinct public name, referring to it inside the API only as grok-imagine-video. According to a Latent Space post that drew on a developer briefing, the video model was trained on a cluster built around 110,000 NVIDIA GB200 GPUs, which would make it one of the largest single-purpose training jobs publicly disclosed for a video model. That figure has not been independently confirmed by xAI's own publications. Treat it as the company's own claim rather than a verified count.

Audio generation runs as a third stage. After a clip's visual frames are produced, the audio model conditions on the visual content and the original text prompt to generate matched ambient sound, music, and dialogue. The result is muxed with the video before delivery. xAI has not disclosed whether the audio model is a fine-tune of the speech systems that run inside the Grok voice assistant or a separate model entirely.

API and pricing

The Grok Imagine API opened on January 28, 2026 and is documented at docs.x.ai. It uses asynchronous polling. A client submits a request, receives a job identifier, and then polls for completion. Input images can be supplied as public URLs or as base64 data URIs. The API ships SDKs for Python through both the xAI client and the OpenAI compatibility shim, JavaScript through the Vercel AI SDK, and plain REST through curl.

Endpoint	Function	List price (per call)
grok-imagine-image-quality (standard)	Still image generation and edits	$0.02 per output image
grok-imagine-image-quality (pro)	Higher quality still image generation	$0.07 per output image
grok-imagine-video	Text-to-video and image-to-video with audio	$0.07 per second, equivalent to $4.20 per minute of finished video

The $4.20 per minute headline is roughly one third of Google's Veo 3.1 Preview, which lists at $12 per minute with audio, and around one seventh of OpenAI Sora 2 Pro at $30 per minute. The $4.20 number is also unmistakably an Elon Musk price point; he has used 420 in product launches before, including the brief 2018 attempt to take Tesla private at $420 per share. Whether you find that funny or tiresome probably depends on how long you have been paying attention to the company.

Consumer access to Grok Imagine is bundled into Grok's subscription tiers. The current lineup runs from a free tier with hard rate limits, through SuperGrok Lite at $10 per month, SuperGrok at $30 per month or $300 per year, up to SuperGrok Heavy at $300 per month. The free tier permits roughly 10 image generations every two hours. SuperGrok unlocks unlimited image generation and around 100 video renders per day. SuperGrok Heavy raises those limits substantially and is the only tier with full access to Grok 4 Heavy reasoning alongside Imagine. Premium+ subscribers on the X platform get a more limited Grok Imagine quota as a bundled feature.

For enterprise customers, xAI sells the API on standard pay-as-you-go terms with no minimum commit. Volume discounts are negotiated directly. The company has not published an SLA for video generation latency, which currently runs between roughly 15 seconds and 90 seconds depending on clip length and queue depth.

Controversies

Spicy mode and non-consensual deepfakes

Grok Imagine launched on August 4, 2025 with four content presets: fun, normal, custom, and "spicy." The spicy preset was meant to allow suggestive or sexually charged content within xAI's posted policy, which still formally banned "depicting likenesses of persons in a pornographic manner." The reality on launch day was very different. Jess Weatherbed of The Verge, working on a story about the new product, wrote that spicy mode produced topless videos of Taylor Swift the first time she tried it, without any nudity-related wording in the prompt. Common Dreams, the Penn State Digital Shred blog, and the AI Incident Database documented similar bypasses against other female celebrities including Scarlett Johansson.

Musk's response was muted. xAI updated the content filters to add stronger refusals for named celebrities, but reviewers continued to find ways around them for several weeks. Haley McNamara of the National Center on Sexual Exploitation said in a statement that "xAI appears to be doubling down on furthering sexual exploitation by enabling AI videos to create nudity." The Hollywood Reporter and Deadline ran pieces speculating about lawsuits, though as of May 2026 no celebrity has filed a public complaint against xAI over Imagine output specifically.

Regulatory action and minors

In January 2026, Euronews reported that researchers had used Grok Imagine to produce sexually explicit imagery of women and what appeared to be minors. The investigation prompted the United Kingdom's Online Safety regulator and France's Arcom to open formal inquiries into whether Grok Imagine was producing illegal content under each country's child sexual abuse material rules. The story landed days before the 1.0 release and turned what would have been a victory lap into a more defensive launch, with xAI publishing an updated content policy alongside the version bump.

xAI has said it scans output for child sexual abuse material using industry-standard hashing and that it bans the relevant prompt patterns. Independent researchers at the Stanford Internet Observatory and the Center for Countering Digital Hate have disputed how aggressive those filters actually are in practice. The dispute is ongoing.

Moderation philosophy

The deeper question across all of these stories is whether xAI's stated philosophy of "maximally truth-seeking, minimally censored" output is compatible with the kind of harm reduction other model providers attempt. Musk has been explicit, on X and in interviews, that he sees most existing AI content filters as politically biased. Grok Imagine sits at the sharp end of that view. The product is more permissive than Sora 2 or Veo 3 by design, and the rollout has been a long, public experiment in what happens when you ship that posture to a mass audience. The cost so far has been measured in regulatory inquiries and reputational damage rather than in product revenue, which has continued to grow.

Comparison to competitors

The AI video generation market in early 2026 sits roughly in three tiers. Sora 2 and Veo 3 are the high quality, expensive, more cautious options. Grok Imagine, Bytedance's Seedance, Alibaba's Wan 2.5, and a handful of others target the middle on price and creative latitude. Nano Banana is a separate category entirely, focused on lightweight edits and stylized clips rather than full text-to-video.

Product	Vendor	Max video length	Max resolution	Audio	List price (with audio)	Notable content policy
Grok Imagine 1.0	xAI	10 s API, 15 s app	720p	Yes	$4.20 per minute	Permissive. Spicy mode for suggestive content.
Sora 2 Pro	OpenAI	20 s	1080p	Yes, including dialogue	$30 per minute	Strict. Refuses celebrity likenesses by default.
Veo 3.1	Google DeepMind	8 s	1080p	Yes, including dialogue	$12 per minute	Strict, with watermarking via SynthID.
Nano Banana	Google	6 s	720p	Limited	Free tier dominant	Light editing focus, not a full T2V tool.

Reviewers tend to give Sora 2 the edge on physical realism and Veo 3 the edge on cinematic camera moves and synchronized lip sync. Grok Imagine wins on price, on raw throughput, and on willingness to render almost any subject the user asks for. Tom's Guide ran a seven-prompt comparison in November 2025 and gave Sora 2 the overall win, though Grok Imagine produced two clips the reviewer preferred and did so at roughly a tenth of the cost. The Latent Space writeup in February 2026 went further and called Grok Imagine "the #1 video model on cost and latency," while acknowledging that on best-of-three quality it still trailed the leaders.

The comparison with Nano Banana is more lopsided in xAI's favor, because Nano Banana is not really a competing product. Google positioned Nano Banana as a fast, conversational image editor sitting inside Gemini, with short video generation as a secondary feature. Grok Imagine targets the full pipeline. Where the two overlap is in single-shot meme-style clips, and there Nano Banana's tighter integration with the Gemini conversational model gives it an edge in iteration speed for users who already live inside Google's stack.

Reception

Reception of Grok Imagine has been split along familiar lines. Developers and power users on X have been broadly positive, citing the price, the API ergonomics, and the absence of the kind of refusals that have frustrated heavy users of competing services. The Latent Space and Hacker News threads around the API launch were dominated by people running the same prompts on Grok Imagine, Sora 2, and Veo 3 and posting the results. Most of those side-by-side comparisons concluded that Grok Imagine was usable for a wider range of subjects than either competitor.

Mainstream press coverage has been harsher. The Verge, TechCrunch, Common Dreams, Deadline, and Euronews have all run multiple critical pieces, and the framing in those stories has settled into a template. The reviewer tries the product, finds an objectionable output within minutes, and writes the story around that finding. xAI has not run a sustained public response to the press cycle beyond Musk's own posts on X, which generally either ignore the criticism or frame it as legacy media bias.

The usage numbers suggest that neither the controversies nor the resolution cap have meaningfully slowed adoption. The 1.245 billion videos in 30 days figure from February 2026 is roughly an order of magnitude above what Runway, Pika, and Luma have publicly disclosed for the same period. Some of that volume is driven by the bundled access for Premium+ subscribers on X, and the per-user generation count has not been broken out, so the comparison should be read with caution. Even discounting for that, Grok Imagine is on track to be the highest volume consumer video generation product in the market, at least until OpenAI ships the consumer build of its Sora app to a broader audience.

Whether that volume converts to durable revenue is the open question. xAI has not disclosed revenue from the Imagine product line separately, and the company's overall financials remain private. The most credible read at the moment is that Imagine is a strategic loss leader designed to anchor users to the Grok subscription bundle and to give the X platform a creative tool that none of its competitors can match for free or near-free output. The pricing on the API supports that read, and so does the speed at which xAI has shipped features like SuperGrok Lite that lower the entry barrier even further.

References

xAI. "Grok Image Generation Release." December 9, 2024. https://x.ai/news/grok-image-generation-release
xAI. "Grok Imagine API." January 28, 2026. https://x.ai/news/grok-imagine-api
xAI Docs. "Imagine Overview." https://docs.x.ai/developers/model-capabilities/imagine
American Bazaar Online. "xAI takes 'biggest leap yet,' launches Grok Imagine 1.0." February 6, 2026. https://americanbazaaronline.com/2026/02/06/xai-takes-biggest-leap-yet-launches-grok-imagine-1-0-474633/
TechCrunch. "Grok Imagine, xAI's new AI image and video generator, lets you make NSFW content." August 4, 2025. https://techcrunch.com/2025/08/04/grok-imagine-xais-new-ai-image-and-video-generator-lets-you-make-nsfw-content/
The Verge. "Grok's 'spicy' video setting instantly made me Taylor Swift nude deepfakes." August 2025.
Common Dreams. "Safeguards? What Safeguards? Grok's New 'Spicy Mode' Makes Nude Taylor Swift Deepfakes." August 2025. https://www.commondreams.org/news/taylor-swift-nude-deepfakes
Musically. "Grok 'spicy mode' accused of making NSFW Taylor Swift deepfakes." August 6, 2025. https://musically.com/2025/08/06/grok-spicy-mode-accused-of-making-nsfw-taylor-swift-deepfakes/
AI Incident Database. "Incident 1165: Grok Imagine Reportedly Produces Non-Consensual Taylor Swift Deepfake Nudes Without Explicit Prompting." https://incidentdatabase.ai/cite/1165/
Euronews. "Grok under fire for generating sexually explicit deepfakes of women and minors." January 5, 2026. https://www.euronews.com/next/2026/01/05/grok-under-fire-for-generating-sexually-explicit-deepfakes-of-women-and-minors
Latent Space. "SpaceXai Grok Imagine API: the #1 Video Model, Best Pricing and Latency." January 2026. https://www.latent.space/p/ainews-spacexai-grok-imagine-api
Tom's Guide. "How does Grok Imagine compare to Sora 2? Here's what happened when I ran 7 tests." https://www.tomsguide.com/ai/ai-image-video/i-tested-sora-2-vs-grok-imagine-with-7-challenging-prompts-and-theres-a-clear-winner
The Decoder. "xAI's Aurora image model becomes official, built from scratch." December 2024. https://the-decoder.com/xais-aurora-image-model-becomes-official-built-from-scratch/
TechCrunch. "Elon Musk's X gains a new image generator, Aurora." December 7, 2024. https://techcrunch.com/2024/12/07/elon-musks-x-gains-a-new-image-generator-aurora/
EM360Tech. "What is xAI Aurora Generator? Inside Grok's New Image Generator." https://em360tech.com/tech-articles/what-xai-aurora-generator-inside-groks-new-image-generator
Wikipedia. "Grok (chatbot)." Sections on Aurora and Grok Imagine. https://en.wikipedia.org/wiki/Grok_(chatbot)
Fello AI. "Grok Pricing 2026: SuperGrok, X Premium+, Heavy & API Costs." https://felloai.com/grok-pricing/
mlq.ai. "Grok to Launch Text-to-Video Generation with Sound via Imagine Feature in October." 2025. https://mlq.ai/news/grok-to-launch-text-to-video-generation-with-sound-via-imagine-feature-in-october/
Times of AI. "Elon Musk Unveils Grok Imagine v0.9; Here's What's New." October 2025. https://www.timesofai.com/news/elon-musk-unveils-grok-imagine-v0-9/
WaveSpeed Blog. "Grok Imagine Video vs Sora 2, Veo 3.1, Seedance 1.5, WAN 2.5/2.6, and Vidu Q3: Complete Comparison." 2026. https://wavespeed.ai/blog/posts/grok-imagine-video-vs-sora-2-veo-3-seedance-wan-vidu-comparison-2026/
Deadline. "Elon Musk's Latest AI Frontier: 'Spicy' Deepfakes Of Stars Like Scarlett Johansson & Taylor Swift." August 2025. https://deadline.com/2025/08/elon-musk-ai-deepfakes-scarlett-johansson-taylor-swift-1236480553/
SQ Magazine. "xAI Launches Grok Imagine API to Rival Google and OpenAI in Video Generation." https://sqmagazine.co.uk/xai-grok-imagine-api-video-launch/

Grok Imagine

Grok Imagine

Background

Version timeline

Capabilities

Aurora architecture and engine

API and pricing

Controversies

Spicy mode and non-consensual deepfakes

Regulatory action and minors

Moderation philosophy

Comparison to competitors

Reception

See also

References

Improve this article

Grok Imagine

Background

Version timeline

Capabilities

Aurora architecture and engine

API and pricing

Controversies

Spicy mode and non-consensual deepfakes

Regulatory action and minors

Moderation philosophy

Comparison to competitors

Reception

See also

References

Grok Imagine

Background

Version timeline

Capabilities

Aurora architecture and engine

API and pricing

Controversies

Spicy mode and non-consensual deepfakes

Regulatory action and minors

Moderation philosophy

Comparison to competitors

Reception

See also

References

Improve this article

Related Articles

NVIDIA Picasso

Stable Diffusion 3

Ideogram 3.0

Nano Banana

Seedream

Luma Dream Machine

Grok Imagine

Background

Version timeline

Capabilities

Aurora architecture and engine

API and pricing

Controversies

Spicy mode and non-consensual deepfakes

Regulatory action and minors

Moderation philosophy

Comparison to competitors

Reception

See also

References

Related Articles

NVIDIA Picasso

Stable Diffusion 3

Ideogram 3.0

Nano Banana

Seedream

Luma Dream Machine