CapCut VideoGPT
Last reviewed
May 11, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v4 · 2,231 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 11, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v4 · 2,231 words
Add missing citations, update stale details, or suggest a clearer explanation.
| CapCut VideoGPT | |
|---|---|
![]() | |
| Information | |
| Name | CapCut VideoGPT |
| Platform | ChatGPT |
| Store | GPT Store |
| Model | GPT-4 |
| Category | Lifestyle / Productivity |
| Description | Ideas to videos or designs with vast templates. Text-to-video with auto voiceover and elements. |
| Developer | capcut.com (ByteDance) |
| OpenAI URL | https://chat.openai.com/g/g-Gpu8ZMR52-capcut-videogpt |
| Chats | 12,000+ |
| Actions | Yes |
| Web Browsing | Yes |
| Free | Yes (ChatGPT Plus required) |
| Available | Yes |
| Updated | 2024-01-24 |
CapCut VideoGPT is a Custom GPT for ChatGPT published in the GPT Store by the team behind CapCut, the video editing app owned by ByteDance. The GPT turns a written brief into a finished short video. The user types what they want, the model writes an English voiceover script, and the connected CapCut backend assembles footage, captions, music, and a synthesised narration into a downloadable clip. A second mode searches CapCut's template library so users can pick a pre built design to remix on the CapCut web app.
CapCut VideoGPT was one of the more prominent third party agents at the launch of the GPT Store on 10 January 2024, sitting alongside other early featured partners such as Canva, Khan Academy Code Tutor, AllTrails, and the Consensus academic search agent. By mid 2024 it had logged tens of thousands of conversations, which made it one of the busier video focused GPTs in the store.
CapCut is the international name for ByteDance's video editor, originally released in China in 2019 as JianYing and rolled out worldwide in 2020 after a brief stint under the name ViaMaker. The app sits in the same corporate family as TikTok and shares a lot of its visual grammar: portrait first, fast cuts, snappy captions, and a heavy reliance on templates. By 2022 the mobile app had reported more than 200 million active users, and ByteDance has continued to add automatic captioning, background removal, and generative features to the editor itself.
The link between CapCut and OpenAI predates the GPT Store. In July 2023 ByteDance shipped a CapCut plugin for ChatGPT under the older plugin system, which let Plus subscribers describe a topic and get a full short video back, complete with voiceover, soundtrack, and stock footage. When OpenAI deprecated the plugin marketplace in favour of GPTs in late 2023, the CapCut team rebuilt the experience as CapCut VideoGPT and kept most of the original behaviour. The system prompt and conversation starters carried over almost verbatim from the plugin era.
The GPT itself is listed under the developer name capcut.com on the GPT Store, with the privacy policy hosted on a ByteDance content delivery network (sf21-draftcdn-sg.ibytedtos.com). The unique identifier in the URL, g-Gpu8ZMR52, dates the build to the original GPT Store rollout in January 2024.
The GPT exposes two main flows, both controlled through natural language in the chat.
The first flow is text to video. A user describes a topic, mood, or audience, and the model writes a short English script suitable for a voiceover. By default it picks an aspect ratio (typically 16:9 for general clips, 9:16 for vertical, 1:1 for square posters) and calls the CapCut video generation API. CapCut's servers run the script through a text to speech engine, match it with stock footage and B roll, layer music underneath, and render a preview. The model then returns a link styled as "View the AI video result from CapCut" and warns the user that the video may take roughly a minute to finish loading the first time the link is opened. The script and aspect ratio can be revised after the first render, and the user can ask for a longer or shorter version.
The second flow is template search. The user describes the kind of effect or look they are after, for example a New Year poster or a TikTok dance edit, and the GPT calls the CapCut template search API. The model is instructed to confirm the aspect ratio first, then display up to six matching templates in a markdown table that pairs each preview image with a clickable link back to the CapCut web app. Selecting a template opens the editor in the browser, where the user can swap in their own photos or clips.
A third smaller behaviour, baked into the system prompt, is that the GPT is told to prompt the user once per chat with the email address capcut-web@bytedance.com for feedback. The instruction is explicit about doing this only on the first returned video or template, not on every result.
CapCut VideoGPT has Actions, Web Browsing, and DALL E enabled in its GPT configuration, but the heavy lifting happens through the two custom Actions:
| Action | What it does | Inputs | Output |
|---|---|---|---|
| Video Generation API | Sends an English script and aspect ratio to CapCut, which renders a video with voiceover, music, and stock visuals | Topic or script, aspect ratio | Hosted video URL |
| Template Search API | Queries CapCut's template catalogue for a given keyword and aspect ratio | Keywords, aspect ratio | Up to six template previews with links |
Authentication is set to none in the GPT manifest, which means the user does not need to sign into CapCut to generate a first preview. Saving, exporting, or further editing the result on the CapCut web app does require a CapCut account, which can be linked to a TikTok, Google, Apple, or email login.
Web Browsing is enabled but rarely used in normal sessions, since most of the work happens server side at CapCut. DALL E generation is technically available, although the system prompt steers the model toward CapCut's own stock and template engines rather than asking DALL-E for images.
The published instructions describe a fairly opinionated workflow. A condensed version of the system prompt looks like this:
The instructions are written in slightly clipped English and reuse the model boilerplate that OpenAI ships with every new Custom GPT: "You are a 'GPT', a version of ChatGPT that has been customized for a specific use case." That preamble appears on most GPT Store agents and is not specific to CapCut.
The GPT ships with four conversation starters, designed to map onto the two main flows:
The starters cover the three most common aspect ratios CapCut handles: 16:9 for traditional YouTube and landscape clips, 1:1 for square social posts, and 9:16 for TikTok, Reels, and Shorts. The default behaviour if the user does not specify is to ask, with 16:9 typically offered as the fallback for narrative video and 9:16 for template work.
The GPT is aimed at users who want a finished short video without opening a separate editor. Common patterns include:
It is less suited to long form storytelling. The text to video pipeline picks stock visuals based on keywords in the script, so it tends to feel generic for niche topics, and the synthesised voiceover is recognisably AI rather than a human read. Most users treat the first output as a draft, then move into the full CapCut editor to swap in their own footage, tighten the cuts, and replace the narration if needed.
When OpenAI launched the GPT Store on 10 January 2024, the company highlighted a handful of partner GPTs in its announcement. Most of those launch picks were productivity, education, or lifestyle agents: AllTrails for hiking routes, Consensus for paper search, Khan Academy's Code Tutor, Canva for visual design, and the CK 12 Flexi tutor for K 12 students. CapCut VideoGPT was one of the few video specific tools featured in that initial group of partner agents, which gave it heavy traffic during the launch week.
Within the store taxonomy it usually sits in either the Lifestyle or Productivity rails, depending on how the editorial team is grouping things that week. The official metadata shows somewhere in the order of 12,000 chats early on, although third party trackers later put the figure into the hundreds of thousands once the GPT had been live for several months. Either way, it has consistently appeared on the GPT Store's top videos list since launch and is often cited as one of the reference examples for what an Actions backed Custom GPT can do when paired with a real product backend.
Several other video focused agents launched in the same window and target overlapping audiences. The table below sketches the main differences.
| GPT | Developer | What makes it different |
|---|---|---|
| CapCut VideoGPT | ByteDance / CapCut | Tight integration with the CapCut web editor; strong template library |
| Video GPT by VEED | VEED.IO | Emphasis on social media avatars and on platform editing |
| Video Maker by invideo AI | invideo AI | Longer narrative videos and stock footage focus |
| Canva | Canva | Posters, social posts, and presentations rather than video |
CapCut VideoGPT's biggest advantage over the others is the handoff: the rendered video opens directly in CapCut's web editor, which is one of the most widely used free video tools on the consumer side. Users who already have CapCut accounts can keep iterating on the same project without exporting and re importing files.
The GPT manifest sets the Auth type to none, so the connection between ChatGPT and CapCut does not pass a user identity by default. The privacy policy linked in the manifest is hosted on ByteDance's content delivery network and covers the wider CapCut family of products. As with any GPT that calls a third party API, the user's prompts and the generated outputs are processed by both OpenAI and CapCut, and the rendered video files sit on CapCut's servers under the standard CapCut content policies.
Users who do not want their inputs used to train OpenAI models can disable chat history in the ChatGPT account settings; the same prompts still travel through CapCut's API and are subject to ByteDance's retention rules.
The GPT inherits the limitations of the underlying CapCut auto editor. Stock footage selection is keyword driven and can miss the mark for unusual topics, the voiceover is a default AI voice rather than something the user can pick before rendering, and the music is selected automatically from a pre cleared library. Users who want fine grained control over visuals, voice casting, or music licensing typically use the GPT only for the first pass, then move into CapCut's main editor.
The text to video pipeline is also English first. The system prompt explicitly tells the model to write the script in English, even when the user converses in another language, because the voice generation step is tuned for English narration. Non English users often draft the script in their own language elsewhere and then translate before sending it through the GPT.