# Descript

> Source: https://aiwiki.ai/wiki/descript
> Updated: 2026-06-25
> Categories: AI Tools & Products, Artificial Intelligence, Speech & Audio AI, Video Generation
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**Descript** is an [artificial intelligence](/wiki/artificial_intelligence)-powered audio and video editing platform that lets users edit media by editing a text transcript: delete a word from the transcript and the matching audio and video disappear too. Founded in December 2017 by Andrew Mason, the former CEO of Groupon, Descript is headquartered in San Francisco, California [1][4]. The platform has raised roughly $100-104 million in total funding, with its $50 million Series C led by the [OpenAI](/wiki/openai) Startup Fund in November 2022 at a valuation of approximately $550 million [3][6]. Its best-known features include Overdub (AI [voice cloning](/wiki/voice_cloning), built on technology from the 2019 Lyrebird acquisition), Studio Sound (audio enhancement), Eye Contact (gaze correction), filler-word removal, and Underlord (an agentic AI co-editor) [1][3]. Descript reached an estimated $55 million in annual recurring revenue in late 2024, a figure Sacra puts at roughly 75% year-over-year growth [17]. Its customers include NPR, VICE, The Washington Post, The New York Times, Shopify, and HubSpot [4].

## What is Descript?

Descript is a tool for recording, transcribing, editing, collaborating on, and publishing podcasts and videos, all from a single document-style interface. Its defining idea is transcript-based editing: instead of dragging waveforms or video clips on a timeline, the user edits a written transcript of the recording, and the underlying audio and video stay in sync automatically. Andrew Mason has framed the company's target market in unusually direct terms: "We think of our main competition as non-editors, people who aren't making video because the tools are too complex" [3]. Descript pairs that text-first paradigm with a stack of [generative AI](/wiki/generative_ai) features for voice synthesis, noise removal, captioning, and multi-step automated editing.

## History

### Andrew Mason and the Path to Descript

Andrew Mason is a serial entrepreneur whose career has been defined by a pattern of identifying friction in everyday experiences and building technology to eliminate it. Mason graduated from Northwestern University and first gained prominence as the founder and CEO of **Groupon**, the daily deals platform that launched in 2008 and became one of the fastest-growing companies in internet history [4]. Groupon's IPO in November 2011 valued the company at $12.7 billion, making it one of the largest internet IPOs since Google [15]. Mason served as CEO until February 2013, when he was removed by the board amid declining stock prices and operational challenges. In a characteristically candid departure memo, Mason acknowledged being fired and wished the company well [15].

After leaving Groupon, Mason co-founded **Detour** in 2014, an augmented reality startup that created location-based audio walking tours. Users would put on headphones and walk through cities while Detour played narrated, GPS-triggered audio experiences that blended storytelling with the physical environment. The app attracted critical acclaim for its innovative approach to urban exploration.

It was at Detour that the seed of Descript was planted. To produce Detour's audio tours, the team needed to record, edit, and produce large volumes of spoken-word audio content. Mason quickly discovered that the audio editing tools available at the time (primarily designed for music production) were overly complex and poorly suited for speech editing [4]. Manipulating audio waveforms to cut, rearrange, or fix spoken content was tedious and required technical expertise that most content creators lacked.

Mason's insight was simple but powerful: since the audio content was fundamentally speech (words spoken in sequence), it should be possible to edit the audio by editing a text transcript instead. As Contrary Research summarizes the core innovation, "deleting a word from the text transcript would remove that same word from the audio file" [4]. Rearrange paragraphs in the transcript, and the audio follows suit. This text-based approach would make audio editing as intuitive as editing a document in a word processor.

The Detour team spent roughly two and a half years developing this transcript-based audio editing technology as an internal tool. The technology relied on two advancing fields: automated [speech recognition](/wiki/speech_recognition) (to generate accurate transcripts) and text-audio alignment (to maintain synchronization between transcript edits and audio playback) [4].

In 2017, Bose acquired Detour as part of its augmented reality audio platform initiative. The Detour app was eventually discontinued, but the internal audio editing tool that Mason's team had built was too promising to abandon. Mason spun it off as a separate company, formally founding **Descript** in December 2017 [4].

### When was Descript founded and how was it funded early on (2017-2019)?

Descript's potential was recognized early by leading venture capital firms. The company's initial funding came from [Andreessen Horowitz](/wiki/andreessen_horowitz), which participated in the company's earliest investment round in December 2017.

In September 2019, Descript raised a **$15 million Series A** round led by Andreessen Horowitz and Redpoint Ventures [1]. Alongside this funding, Descript made a strategically significant acquisition: it purchased **Lyrebird**, a Montreal-based AI startup that had developed advanced voice cloning technology [1]. Lyrebird was founded by Alexandre de Brebisson, Jose Sotelo, and a team of PhD researchers from the University of Montreal, and its machine learning model could copy a person's voice from roughly one minute of sample audio, reproducing the speaker's intonation and cadence [1].

The Lyrebird acquisition gave Descript the foundation for what would become its **Overdub** feature, one of the platform's most distinctive capabilities. With Overdub, users could type new words into their transcript and have those words spoken in a synthetic clone of their own voice, seamlessly blending with the original recording [1]. This meant that if a podcaster misspoke or wanted to add a correction, they could simply type the correct words and Overdub would generate the audio, eliminating the need for re-recording.

### When did Descript add video editing, and the Series B (2020-2021)

In October 2020, Descript expanded from audio-only editing to include **video editing** [2]. The same transcript-based editing paradigm applied: users could edit video content by editing the text transcript. Deleting words from the transcript removed the corresponding video footage. The video launch included tools for screen recording, titles, transitions, image overlays, and multi-track editing, all controlled through the familiar text-editing interface [2].

This expansion was particularly timely given the COVID-19 pandemic's acceleration of remote work, online education, and video content creation. The demand for accessible video editing tools was growing rapidly as millions of new content creators entered the market.

In January 2021, Descript raised a **$30 million Series B** round led by Spark Capital's Nabeel Hyatt, with continued participation from Andreessen Horowitz and Redpoint Ventures [5]. The round valued the company at approximately $260 million [3]. The funding was used to expand the engineering team and accelerate product development.

### Why did OpenAI invest in Descript (Series C, 2022)?

In November 2022, Descript raised a **$50 million Series C** round led by the **[OpenAI](/wiki/openai) Startup Fund**, bringing its total raised at the time to roughly $100 million [3]. The round also included participation from Andreessen Horowitz, Redpoint Ventures, Spark Capital, and prominent individual investors such as Daniel Gross (former Y Combinator partner), Casey Neistat (YouTube creator), Tobi Lutke (Shopify founder), Shishir Mehrotra (Coda CEO), Lenny Rachitsky (product advisor and writer), Naval Ravikant (AngelList founder), and Rahul Vohra (Superhuman founder) [3].

The Series C valued Descript at approximately **$550 million**, more than double its valuation from just 21 months prior [6]. The strategic significance of OpenAI's investment extended beyond capital: it positioned Descript to deeply integrate [OpenAI](/wiki/openai)'s [large language models](/wiki/llm) and generative AI capabilities into its editing platform.

Brad Lightcap, then OpenAI's Chief Operating Officer, explained the investment thesis: "It's clear from using Descript and talking to customers that Descript is breaking down barriers between idea and creation" [3]. Descript was only the second company to receive funding from the OpenAI Startup Fund, after the AI note-taking app Mem [3].

### Growth and the Underlord AI Co-Editor (2023-2026)

Descript progressively folded its AI capabilities into **Underlord**, the platform's AI co-editor. Underlord integrates the platform's AI features into a unified assistant that can execute multi-step editing workflows. Instead of manually applying individual AI tools, users can instruct Underlord with natural language commands like "polish this podcast episode for publishing," and the system will execute a sequence of editing operations: removing filler words, applying Studio Sound noise reduction, adding captions, cutting dead air, adjusting pacing, and suggesting social media clip candidates.

Descript launched Underlord as the centerpiece of its "Season 6" product release on April 1, 2025, describing it as "your all-powerful, incredibly versatile, book-smart editing assistant" and marking a shift from a tool with some AI features to a platform built around an agentic AI co-editor [11][18]. By the August 2025 update, Underlord could execute 15 to 20 editing steps in sequence and make judgment calls about pacing and content selection, a clear move toward [agentic AI](/wiki/ai_agents) workflows in creative software [11].

Descript also introduced a new pricing model in September 2025, transitioning from simple transcription-hour limits to a more granular system based on **Media Minutes** (for uploads and recordings) and **AI Credits** (for AI-powered features like Studio Sound, Eye Contact, Green Screen, and Overdub) [10].

In October 2025, Descript added a **model picker** to Underlord, starting with Anthropic's Claude Sonnet 4.5, so users could choose which underlying [large language model](/wiki/llm) powers the assistant [12][19]. Descript engineer Ajay Arasanipalai framed the change this way: "With the model picker, we're letting you choose a point in that tradeoff space for yourself" [19]. By a January 2026 update, Underlord offered several Claude tiers (Sonnet as the self-serve default, Opus for enterprise customers prioritizing quality, and Haiku for fast iterative sessions) alongside Google's Gemini 3.1 Pro, and Descript said the improvements cut the AI credits needed to fully edit a video before publishing by roughly 20% [12].

By late 2024, Descript had reached approximately **$55 million in annual recurring revenue**, up from an estimated $31 million in August 2024 and representing about 75% year-over-year growth, according to Sacra [17]. The company employed roughly 188 people as of early 2026 [4].

### Has Spotify acquired Descript?

Despite recurring speculation, there is no confirmed acquisition of Descript by Spotify as of mid-2026. Spotify has made numerous acquisitions in the podcasting space, including Gimlet Media, Anchor, The Ringer, Megaphone, and WhoSampled (acquired in November 2025) [16]. However, Descript does not appear among Spotify's acquisitions, and no credible reports confirm such a deal. Descript remains a private, independent company [16].

## How does Descript work?

### Transcript-Based Editing

Descript's foundational innovation is the ability to edit audio and video by editing a text transcript. The workflow operates as follows:

1. Users import or record audio/video content.
2. Descript automatically generates a transcript using its speech recognition engine (supporting 22 languages).
3. Users edit the transcript as they would a text document: deleting, rearranging, cutting, or adding text.
4. All changes to the transcript are reflected in the underlying audio and video in real time.

This approach dramatically lowers the barrier to entry for media editing, making it accessible to people with no experience in traditional audio/video editing software. The text-first interface means that users who are comfortable with word processors can edit podcasts and videos.

### What is Overdub?

**Overdub** is Descript's AI voice cloning feature, built on technology acquired from Lyrebird in 2019 [1]. Overdub allows users to:

- Create a digital clone of their voice by providing a voice sample.
- Generate synthetic speech in their voice by typing text into the transcript.
- Seamlessly blend AI-generated speech with original recordings.
- Correct mistakes, add new content, or modify existing recordings without re-recording.

Overdub uses approximately 5 AI credits per minute of generated audio. To address ethical concerns around voice cloning and [deepfake](/wiki/ai_voice_agent) misuse, Descript states that "you can only use Overdub on your own voice," requiring the speaker to record a set of randomly generated sentences for verification before a voice model can be trained [1].

### Studio Sound

**Studio Sound** is an AI-powered audio enhancement feature that processes recordings to achieve professional studio-quality audio from any recording environment. It performs several functions:

| Function | Description |
|---|---|
| Background Noise Removal | Eliminates ambient noise (traffic, HVAC, keyboard clicks, etc.) |
| Speech Enhancement | Boosts vocal clarity and presence |
| Echo Cancellation | Removes room reverb and echo artifacts |
| Level Normalization | Balances audio levels across speakers and segments |

Studio Sound uses 10 AI credits per application and is particularly valuable for podcasters and video creators who record in non-studio environments.

### What is Eye Contact?

**Eye Contact** is an AI feature that adjusts the gaze direction of a speaker in video to simulate direct eye contact with the camera. This addresses a common problem in video production: when speakers read from scripts, teleprompters, or notes positioned away from the camera lens, their gaze appears off-center. Eye Contact uses [computer vision](/wiki/computer_vision) to detect the speaker's eyes and subtly adjust them so they appear to be looking directly at the viewer.

The feature uses 10 AI credits per application and is available on Creator plans and above.

### Filler Word Removal

Descript's filler word removal feature automatically detects and removes verbal fillers such as "um," "uh," "like," "you know," "sort of," and similar speech disfluencies. The system identifies these fillers in the transcript and allows users to remove them all at once with a single click, along with the corresponding audio. Users can also selectively remove individual filler words. This feature is available for free and does not require AI credits.

### Green Screen

**Green Screen** is an AI-powered background removal tool that detects the speaker in a video frame and removes the background without requiring a physical green screen. Users can replace the removed background with custom images, videos, or solid colors, or leave it transparent for compositing. The feature uses 15 AI credits per application.

### What can Underlord do? (AI Co-Editor)

**Underlord** is Descript's AI assistant that functions as an intelligent co-editor. Rather than requiring users to manually apply individual AI features, Underlord can execute complex, multi-step editing workflows from natural language instructions [11]. Capabilities include:

| Capability | Description |
|---|---|
| Multi-Step Workflows | Executes 15-20 editing steps in sequence from a single instruction |
| Pacing Adjustments | Makes judgment calls about timing, pauses, and rhythm |
| Content Suggestions | Identifies moments suitable for social media clips |
| Polish Workflows | Applies noise removal, filler word removal, captions, and dead air cuts in one pass |
| Show Notes | Generates descriptions, titles, and summaries for podcast episodes |
| Chapter Markers | Automatically creates chapter markers from content structure |
| Model Selection | Lets users pick the underlying LLM (Claude Sonnet, Opus, Haiku, or Gemini) for the assistant |

### Additional Features

| Feature | Description |
|---|---|
| Screen Recording | High-resolution screen capture with customizable webcam placement |
| Storyboard | Scene-based video editing using the "/" key to split content |
| Remote Recording | Integration with Zoom, Google Meet, and other conferencing platforms |
| Multi-Track Editing | Simultaneous camera and screen recording with dynamic speaker labels |
| Collaboration | Real-time multi-user editing, brand templates, and cloud sync |
| Stock Media Library | Access to stock video, images, and music within the editor |
| Captions and Subtitles | AI-generated captions with customizable styling |
| Templates | Pre-built project templates for common content formats |
| Publishing | Direct publishing to YouTube, podcast platforms, and social media |

## What technology powers Descript?

Descript's technology stack spans several areas of [artificial intelligence](/wiki/artificial_intelligence):

### Speech Recognition and Alignment

The platform uses advanced [speech recognition](/wiki/speech_recognition) models to generate transcripts from audio and video content, supporting 22 languages. Critically, the system maintains precise alignment between the text transcript and the underlying media, enabling the real-time synchronization that makes transcript-based editing possible. Human-assisted transcription is also available at $2 per minute for situations requiring higher accuracy.

### Voice Synthesis

The Overdub feature relies on [text-to-speech](/wiki/text_to_speech) voice synthesis technology originally developed by Lyrebird and refined by Descript's AI team [1]. The system analyzes voice samples to create personalized voice models that capture the speaker's pitch, tone, cadence, and speaking style.

### Computer Vision

Features like Eye Contact and Green Screen utilize [computer vision](/wiki/computer_vision) models for face detection, gaze estimation, and foreground-background segmentation. These models run inference on uploaded video to produce pixel-level adjustments.

### Audio Processing

Studio Sound uses neural network-based audio processing for noise reduction, echo cancellation, and speech enhancement. The models are trained to distinguish between desired speech content and unwanted background noise across a wide variety of recording environments.

### Large Language Models

Descript integrates [large language models](/wiki/llm) for natural language understanding in Underlord, content summarization, show notes generation, and the AI Chat features within the platform. As of early 2026, Underlord can be powered by models from multiple providers, including Anthropic's [Claude](/wiki/claude) (Sonnet, Opus, and Haiku) and Google's Gemini, selectable through an in-app model picker [12][19].

## Pricing

As of 2026, Descript offers five pricing tiers based on a combination of Media Minutes and AI Credits:

| Plan | Monthly Cost | Annual Cost (per month) | Media Minutes | AI Credits | Key Features |
|---|---|---|---|---|---|
| Free | $0 | $0 | 60/month | 100 (one-time grant) | Basic editing, watermarked exports |
| Hobbyist | $16/month | $12/month | 600/month | 200/month | No watermark, higher resolution exports |
| Creator | $24/month | $15/month | 1,800/month | 800/month | Full AI features, Overdub, Eye Contact |
| Business | $40/month | $26/month | 6,000/month | 2,400/month | Team features, brand kits, priority support |
| Enterprise | Custom | Custom | Custom | Custom | SSO, dedicated support, custom onboarding, voice cloning agreements |

All plans include unlimited "basic" seats. Annual billing provides savings of up to 35% compared to monthly billing. AI credit costs for individual features include: Studio Sound (10 credits), Eye Contact (10 credits), Green Screen (15 credits), and Overdub (~5 credits per minute).

## How much funding has Descript raised?

Descript has raised approximately $100-104 million across four funding rounds [3][4]:

| Round | Date | Amount | Lead Investor | Key Participants |
|---|---|---|---|---|
| Seed/Early | December 2017 | ~$5 million | Andreessen Horowitz | Initial backing |
| Series A | September 2019 | $15 million | Andreessen Horowitz, Redpoint Ventures | Coincided with Lyrebird acquisition |
| Series B | January 2021 | $30 million | Spark Capital | Andreessen Horowitz, Redpoint Ventures |
| Series C | November 2022 | $50 million | [OpenAI](/wiki/openai) Startup Fund | Andreessen Horowitz, Redpoint, Spark Capital, Daniel Gross |

The Series C round valued Descript at approximately $550 million [6]. Notable individual investors across rounds include Casey Neistat, Tobi Lutke (Shopify), Shishir Mehrotra (Coda), Lenny Rachitsky, Naval Ravikant, and Rahul Vohra (Superhuman) [3]. In total, Descript has 29 investors: 8 institutional and 21 angel investors.

## How does Descript compare to competitors?

Descript operates at the intersection of several competitive markets: audio editing, video editing, transcription services, and AI-powered content creation tools.

### Direct Competitors (Transcript-Based Editing)

| Competitor | Founded | Total Funding | Key Differentiator |
|---|---|---|---|
| Riverside.fm | 2019 | $47 million | Studio-quality remote recording with text-based editing; strong podcast focus |
| Reduct | 2018 | $4 million | Transcript-based video review and collaboration for teams |
| Podcastle | 2020 | $8.8 million | AI audio creation platform for podcasters |

### Video Editing Competitors

| Competitor | Category | Key Differentiator |
|---|---|---|
| Adobe Premiere Pro | Professional NLE | Industry standard for professional video editing; deep Creative Cloud integration |
| CapCut | Consumer/Social | Free, mobile-first editor optimized for social media; owned by ByteDance |
| [Runway](/wiki/runway_ml) | AI-native video | Generative AI video creation and editing; text-to-video capabilities |
| InVideo | AI-assisted | Text-to-video conversion; large template library; raised $52 million |
| VEED.io | Web-based | Browser-based editor with subtitles, AI voiceovers; raised $35 million |

### Audio/Podcast Competitors

| Competitor | Category | Key Differentiator |
|---|---|---|
| Adobe Podcast | AI audio | Web-based AI audio editing powered by Adobe's speech-to-text technology |
| Audacity | Open source | Free, open-source audio editor; large community but no AI features |
| Hindenburg | Journalism | Professional audio editor designed specifically for journalists and podcasters |

### Competitive Positioning

Descript's primary competitive advantage is its unique text-first editing paradigm, which remains unmatched in its depth of implementation. While competitors like Riverside.fm and Reduct offer some transcript-based features, Descript's integration of transcript editing with AI features (Overdub, Studio Sound, Eye Contact, Underlord) creates a more comprehensive platform.

The main competitive risks include commodification of AI features (as transcription, noise removal, and text-to-speech become standard in competing products) and competition from well-funded incumbents like Adobe that can integrate similar AI capabilities into their established editing suites.

## Notable Customers

Descript's customer base spans media organizations, technology companies, educational institutions, and individual content creators [4]:

| Sector | Notable Customers |
|---|---|
| Media and Journalism | NPR, VICE, The Washington Post, The New York Times |
| Technology | Shopify, HubSpot, Masterclass |
| Content Creation | Numerous YouTube channels, TikTok creators, and podcast networks |

## Company Information

| Detail | Value |
|---|---|
| Founded | December 2017 |
| Founder and CEO | Andrew Mason |
| Headquarters | San Francisco, California |
| Employees | ~188 (as of early 2026) |
| Total Funding | ~$100-104 million |
| Latest Valuation | ~$550 million (November 2022) |
| Estimated ARR | ~$55 million (late 2024) |
| Status | Private, independent |

## ELI5: Descript Explained Simply

Imagine you recorded yourself talking, and the computer wrote down everything you said, like the captions on a video. With Descript, if you cross out a word in that written version, the computer also erases that word from the recording, so it sounds like you never said it. You can move sentences around, delete the boring parts, and even type new words and have the app say them in a voice that sounds just like you. It turns editing audio and video into something as easy as fixing a typo in a school essay.

## See Also

- [Speech Recognition](/wiki/speech_recognition)
- [Text-to-Speech](/wiki/text_to_speech)
- [Voice Cloning](/wiki/voice_cloning)
- [AI Voice](/wiki/ai_voice_agent)
- [Computer Vision](/wiki/computer_vision)
- [Generative AI](/wiki/generative_ai)
- [Runway](/wiki/runway_ml)
- [OpenAI](/wiki/openai)
- [AI Agents](/wiki/ai_agents)
- [Video Generation](/wiki/veo)

## References

1. "Andrew Mason's Descript snags $15M, acquires Lyrebird to let users type text to create audio in their own voices." TechCrunch, September 18, 2019.
2. "Descript, Andrew Mason's platform to edit audio by editing text, now lets you edit video, too." TechCrunch, October 21, 2020.
3. "AI-powered media editing app Descript lands fresh cash from OpenAI." TechCrunch, November 15, 2022.
4. "Descript Business Breakdown & Founding Story." Contrary Research, May 2023.
5. "Descript raises $30M to build the next generation of video and audio editing tools." TechCrunch, January 12, 2021.
6. "OpenAI Leads Financing of Andrew Mason's Descript at $500 Million-Plus Valuation." The Information, 2022.
7. "Introducing Descript Podcast Studio & Overdub." Andrew Mason, Descript Blog, Medium.
8. "Underlord: Descript's AI Video Editor, with CEO Andrew Mason." The Cognitive Revolution Podcast.
9. "Descript with Andrew Mason." Software Engineering Daily, March 13, 2020.
10. "Descript's New Pricing (September 2025)." Trebble.fm.
11. "Descript Review 2025: Underlord AI Just Changed Video Editing." AI Tool Analysis.
12. "Descript Underlord Update: Faster AI Video Editing for Creators In 2026." Let's Compare AI.
13. "Descript AI Review 2025: Features, Pricing, and Real Results After 100+ Hours of Use." Fritz.ai.
14. "How Descript hit $28M revenue with a 141 person team in 2023." GetLatka.
15. "Andrew Mason." Wikipedia. Accessed 2026.
16. "List of 30 Acquisitions by Spotify (Feb 2026)." Tracxn.
17. "Descript revenue, funding & growth rate." Sacra. Accessed 2026.
18. "Descript Season 6: Unleash Our AI Assistant in 2025." Descript Blog, April 1, 2025.
19. "Underlord's Got a Model Picker (and Claude Sonnet 4.5)." Descript Blog, October 2, 2025.