Descript

AI Tools & Products Artificial Intelligence Speech & Audio AI Video Generation

20 min read

Updated Jun 25, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 25, 2026

Fact-checked

In review queue

Sources

19 citations

Revision

v5 · 3,996 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Descript is an artificial intelligence-powered audio and video editing platform that lets users edit media by editing a text transcript: delete a word from the transcript and the matching audio and video disappear too. Founded in December 2017 by Andrew Mason, the former CEO of Groupon, Descript is headquartered in San Francisco, California ^[1]^[4]. The platform has raised roughly $100-104 million in total funding, with its $50 million Series C led by the OpenAI Startup Fund in November 2022 at a valuation of approximately $550 million ^[3]^[6]. Its best-known features include Overdub (AI voice cloning, built on technology from the 2019 Lyrebird acquisition), Studio Sound (audio enhancement), Eye Contact (gaze correction), filler-word removal, and Underlord (an agentic AI co-editor) ^[1]^[3]. Descript reached an estimated $55 million in annual recurring revenue in late 2024, a figure Sacra puts at roughly 75% year-over-year growth ^[17]. Its customers include NPR, VICE, The Washington Post, The New York Times, Shopify, and HubSpot ^[4].

What is Descript?

Descript is a tool for recording, transcribing, editing, collaborating on, and publishing podcasts and videos, all from a single document-style interface. Its defining idea is transcript-based editing: instead of dragging waveforms or video clips on a timeline, the user edits a written transcript of the recording, and the underlying audio and video stay in sync automatically. Andrew Mason has framed the company's target market in unusually direct terms: "We think of our main competition as non-editors, people who aren't making video because the tools are too complex" ^[3]. Descript pairs that text-first paradigm with a stack of generative AI features for voice synthesis, noise removal, captioning, and multi-step automated editing.

History

Andrew Mason and the Path to Descript

Andrew Mason is a serial entrepreneur whose career has been defined by a pattern of identifying friction in everyday experiences and building technology to eliminate it. Mason graduated from Northwestern University and first gained prominence as the founder and CEO of Groupon, the daily deals platform that launched in 2008 and became one of the fastest-growing companies in internet history ^[4]. Groupon's IPO in November 2011 valued the company at $12.7 billion, making it one of the largest internet IPOs since Google ^[15]. Mason served as CEO until February 2013, when he was removed by the board amid declining stock prices and operational challenges. In a characteristically candid departure memo, Mason acknowledged being fired and wished the company well ^[15].

After leaving Groupon, Mason co-founded Detour in 2014, an augmented reality startup that created location-based audio walking tours. Users would put on headphones and walk through cities while Detour played narrated, GPS-triggered audio experiences that blended storytelling with the physical environment. The app attracted critical acclaim for its innovative approach to urban exploration.

It was at Detour that the seed of Descript was planted. To produce Detour's audio tours, the team needed to record, edit, and produce large volumes of spoken-word audio content. Mason quickly discovered that the audio editing tools available at the time (primarily designed for music production) were overly complex and poorly suited for speech editing ^[4]. Manipulating audio waveforms to cut, rearrange, or fix spoken content was tedious and required technical expertise that most content creators lacked.

Mason's insight was simple but powerful: since the audio content was fundamentally speech (words spoken in sequence), it should be possible to edit the audio by editing a text transcript instead. As Contrary Research summarizes the core innovation, "deleting a word from the text transcript would remove that same word from the audio file" ^[4]. Rearrange paragraphs in the transcript, and the audio follows suit. This text-based approach would make audio editing as intuitive as editing a document in a word processor.

The Detour team spent roughly two and a half years developing this transcript-based audio editing technology as an internal tool. The technology relied on two advancing fields: automated speech recognition (to generate accurate transcripts) and text-audio alignment (to maintain synchronization between transcript edits and audio playback) ^[4].

In 2017, Bose acquired Detour as part of its augmented reality audio platform initiative. The Detour app was eventually discontinued, but the internal audio editing tool that Mason's team had built was too promising to abandon. Mason spun it off as a separate company, formally founding Descript in December 2017 ^[4].

When was Descript founded and how was it funded early on (2017-2019)?

Descript's potential was recognized early by leading venture capital firms. The company's initial funding came from Andreessen Horowitz, which participated in the company's earliest investment round in December 2017.

In September 2019, Descript raised a $15 million Series A round led by Andreessen Horowitz and Redpoint Ventures ^[1]. Alongside this funding, Descript made a strategically significant acquisition: it purchased Lyrebird, a Montreal-based AI startup that had developed advanced voice cloning technology ^[1]. Lyrebird was founded by Alexandre de Brebisson, Jose Sotelo, and a team of PhD researchers from the University of Montreal, and its machine learning model could copy a person's voice from roughly one minute of sample audio, reproducing the speaker's intonation and cadence ^[1].

The Lyrebird acquisition gave Descript the foundation for what would become its Overdub feature, one of the platform's most distinctive capabilities. With Overdub, users could type new words into their transcript and have those words spoken in a synthetic clone of their own voice, seamlessly blending with the original recording ^[1]. This meant that if a podcaster misspoke or wanted to add a correction, they could simply type the correct words and Overdub would generate the audio, eliminating the need for re-recording.

When did Descript add video editing, and the Series B (2020-2021)

In October 2020, Descript expanded from audio-only editing to include video editing ^[2]. The same transcript-based editing paradigm applied: users could edit video content by editing the text transcript. Deleting words from the transcript removed the corresponding video footage. The video launch included tools for screen recording, titles, transitions, image overlays, and multi-track editing, all controlled through the familiar text-editing interface ^[2].

This expansion was particularly timely given the COVID-19 pandemic's acceleration of remote work, online education, and video content creation. The demand for accessible video editing tools was growing rapidly as millions of new content creators entered the market.

In January 2021, Descript raised a $30 million Series B round led by Spark Capital's Nabeel Hyatt, with continued participation from Andreessen Horowitz and Redpoint Ventures ^[5]. The round valued the company at approximately $260 million ^[3]. The funding was used to expand the engineering team and accelerate product development.

Why did OpenAI invest in Descript (Series C, 2022)?

In November 2022, Descript raised a $50 million Series C round led by the OpenAI Startup Fund, bringing its total raised at the time to roughly $100 million ^[3]. The round also included participation from Andreessen Horowitz, Redpoint Ventures, Spark Capital, and prominent individual investors such as Daniel Gross (former Y Combinator partner), Casey Neistat (YouTube creator), Tobi Lutke (Shopify founder), Shishir Mehrotra (Coda CEO), Lenny Rachitsky (product advisor and writer), Naval Ravikant (AngelList founder), and Rahul Vohra (Superhuman founder) ^[3].

The Series C valued Descript at approximately $550 million, more than double its valuation from just 21 months prior ^[6]. The strategic significance of OpenAI's investment extended beyond capital: it positioned Descript to deeply integrate OpenAI's large language models and generative AI capabilities into its editing platform.

Brad Lightcap, then OpenAI's Chief Operating Officer, explained the investment thesis: "It's clear from using Descript and talking to customers that Descript is breaking down barriers between idea and creation" ^[3]. Descript was only the second company to receive funding from the OpenAI Startup Fund, after the AI note-taking app Mem ^[3].

Growth and the Underlord AI Co-Editor (2023-2026)

Descript progressively folded its AI capabilities into Underlord, the platform's AI co-editor. Underlord integrates the platform's AI features into a unified assistant that can execute multi-step editing workflows. Instead of manually applying individual AI tools, users can instruct Underlord with natural language commands like "polish this podcast episode for publishing," and the system will execute a sequence of editing operations: removing filler words, applying Studio Sound noise reduction, adding captions, cutting dead air, adjusting pacing, and suggesting social media clip candidates.

Descript launched Underlord as the centerpiece of its "Season 6" product release on April 1, 2025, describing it as "your all-powerful, incredibly versatile, book-smart editing assistant" and marking a shift from a tool with some AI features to a platform built around an agentic AI co-editor ^[11]^[18]. By the August 2025 update, Underlord could execute 15 to 20 editing steps in sequence and make judgment calls about pacing and content selection, a clear move toward agentic AI workflows in creative software ^[11].

Descript also introduced a new pricing model in September 2025, transitioning from simple transcription-hour limits to a more granular system based on Media Minutes (for uploads and recordings) and AI Credits (for AI-powered features like Studio Sound, Eye Contact, Green Screen, and Overdub) ^[10].

In October 2025, Descript added a model picker to Underlord, starting with Anthropic's Claude Sonnet 4.5, so users could choose which underlying large language model powers the assistant ^[12]^[19]. Descript engineer Ajay Arasanipalai framed the change this way: "With the model picker, we're letting you choose a point in that tradeoff space for yourself" ^[19]. By a January 2026 update, Underlord offered several Claude tiers (Sonnet as the self-serve default, Opus for enterprise customers prioritizing quality, and Haiku for fast iterative sessions) alongside Google's Gemini 3.1 Pro, and Descript said the improvements cut the AI credits needed to fully edit a video before publishing by roughly 20% ^[12].

By late 2024, Descript had reached approximately $55 million in annual recurring revenue, up from an estimated $31 million in August 2024 and representing about 75% year-over-year growth, according to Sacra ^[17]. The company employed roughly 188 people as of early 2026 ^[4].

Has Spotify acquired Descript?

Despite recurring speculation, there is no confirmed acquisition of Descript by Spotify as of mid-2026. Spotify has made numerous acquisitions in the podcasting space, including Gimlet Media, Anchor, The Ringer, Megaphone, and WhoSampled (acquired in November 2025) ^[16]. However, Descript does not appear among Spotify's acquisitions, and no credible reports confirm such a deal. Descript remains a private, independent company ^[16].

How does Descript work?

Transcript-Based Editing

Descript's foundational innovation is the ability to edit audio and video by editing a text transcript. The workflow operates as follows:

Users import or record audio/video content.
Descript automatically generates a transcript using its speech recognition engine (supporting 22 languages).
Users edit the transcript as they would a text document: deleting, rearranging, cutting, or adding text.
All changes to the transcript are reflected in the underlying audio and video in real time.

This approach dramatically lowers the barrier to entry for media editing, making it accessible to people with no experience in traditional audio/video editing software. The text-first interface means that users who are comfortable with word processors can edit podcasts and videos.

What is Overdub?

Overdub is Descript's AI voice cloning feature, built on technology acquired from Lyrebird in 2019 ^[1]. Overdub allows users to:

Create a digital clone of their voice by providing a voice sample.
Generate synthetic speech in their voice by typing text into the transcript.
Seamlessly blend AI-generated speech with original recordings.
Correct mistakes, add new content, or modify existing recordings without re-recording.

Overdub uses approximately 5 AI credits per minute of generated audio. To address ethical concerns around voice cloning and deepfake misuse, Descript states that "you can only use Overdub on your own voice," requiring the speaker to record a set of randomly generated sentences for verification before a voice model can be trained ^[1].

Studio Sound

Studio Sound is an AI-powered audio enhancement feature that processes recordings to achieve professional studio-quality audio from any recording environment. It performs several functions:

Function	Description
Background Noise Removal	Eliminates ambient noise (traffic, HVAC, keyboard clicks, etc.)
Speech Enhancement	Boosts vocal clarity and presence
Echo Cancellation	Removes room reverb and echo artifacts
Level Normalization	Balances audio levels across speakers and segments

Studio Sound uses 10 AI credits per application and is particularly valuable for podcasters and video creators who record in non-studio environments.

What is Eye Contact?

Eye Contact is an AI feature that adjusts the gaze direction of a speaker in video to simulate direct eye contact with the camera. This addresses a common problem in video production: when speakers read from scripts, teleprompters, or notes positioned away from the camera lens, their gaze appears off-center. Eye Contact uses computer vision to detect the speaker's eyes and subtly adjust them so they appear to be looking directly at the viewer.

The feature uses 10 AI credits per application and is available on Creator plans and above.

Filler Word Removal

Descript's filler word removal feature automatically detects and removes verbal fillers such as "um," "uh," "like," "you know," "sort of," and similar speech disfluencies. The system identifies these fillers in the transcript and allows users to remove them all at once with a single click, along with the corresponding audio. Users can also selectively remove individual filler words. This feature is available for free and does not require AI credits.

Green Screen

Green Screen is an AI-powered background removal tool that detects the speaker in a video frame and removes the background without requiring a physical green screen. Users can replace the removed background with custom images, videos, or solid colors, or leave it transparent for compositing. The feature uses 15 AI credits per application.

What can Underlord do? (AI Co-Editor)

Underlord is Descript's AI assistant that functions as an intelligent co-editor. Rather than requiring users to manually apply individual AI features, Underlord can execute complex, multi-step editing workflows from natural language instructions ^[11]. Capabilities include:

Capability	Description
Multi-Step Workflows	Executes 15-20 editing steps in sequence from a single instruction
Pacing Adjustments	Makes judgment calls about timing, pauses, and rhythm
Content Suggestions	Identifies moments suitable for social media clips
Polish Workflows	Applies noise removal, filler word removal, captions, and dead air cuts in one pass
Show Notes	Generates descriptions, titles, and summaries for podcast episodes
Chapter Markers	Automatically creates chapter markers from content structure
Model Selection	Lets users pick the underlying LLM (Claude Sonnet, Opus, Haiku, or Gemini) for the assistant

Additional Features

Feature	Description
Screen Recording	High-resolution screen capture with customizable webcam placement
Storyboard	Scene-based video editing using the "/" key to split content
Remote Recording	Integration with Zoom, Google Meet, and other conferencing platforms
Multi-Track Editing	Simultaneous camera and screen recording with dynamic speaker labels
Collaboration	Real-time multi-user editing, brand templates, and cloud sync
Stock Media Library	Access to stock video, images, and music within the editor
Captions and Subtitles	AI-generated captions with customizable styling
Templates	Pre-built project templates for common content formats
Publishing	Direct publishing to YouTube, podcast platforms, and social media

What technology powers Descript?

Descript's technology stack spans several areas of artificial intelligence:

Speech Recognition and Alignment

The platform uses advanced speech recognition models to generate transcripts from audio and video content, supporting 22 languages. Critically, the system maintains precise alignment between the text transcript and the underlying media, enabling the real-time synchronization that makes transcript-based editing possible. Human-assisted transcription is also available at $2 per minute for situations requiring higher accuracy.

Voice Synthesis

The Overdub feature relies on text-to-speech voice synthesis technology originally developed by Lyrebird and refined by Descript's AI team ^[1]. The system analyzes voice samples to create personalized voice models that capture the speaker's pitch, tone, cadence, and speaking style.

Computer Vision

Features like Eye Contact and Green Screen utilize computer vision models for face detection, gaze estimation, and foreground-background segmentation. These models run inference on uploaded video to produce pixel-level adjustments.

Audio Processing

Studio Sound uses neural network-based audio processing for noise reduction, echo cancellation, and speech enhancement. The models are trained to distinguish between desired speech content and unwanted background noise across a wide variety of recording environments.

Large Language Models

Descript integrates large language models for natural language understanding in Underlord, content summarization, show notes generation, and the AI Chat features within the platform. As of early 2026, Underlord can be powered by models from multiple providers, including Anthropic's Claude (Sonnet, Opus, and Haiku) and Google's Gemini, selectable through an in-app model picker ^[12]^[19].

Pricing

As of 2026, Descript offers five pricing tiers based on a combination of Media Minutes and AI Credits:

Plan	Monthly Cost	Annual Cost (per month)	Media Minutes	AI Credits	Key Features
Free	$0	$0	60/month	100 (one-time grant)	Basic editing, watermarked exports
Hobbyist	$16/month	$12/month	600/month	200/month	No watermark, higher resolution exports
Creator	$24/month	$15/month	1,800/month	800/month	Full AI features, Overdub, Eye Contact
Business	$40/month	$26/month	6,000/month	2,400/month	Team features, brand kits, priority support
Enterprise	Custom	Custom	Custom	Custom	SSO, dedicated support, custom onboarding, voice cloning agreements

All plans include unlimited "basic" seats. Annual billing provides savings of up to 35% compared to monthly billing. AI credit costs for individual features include: Studio Sound (10 credits), Eye Contact (10 credits), Green Screen (15 credits), and Overdub (~5 credits per minute).

How much funding has Descript raised?

Descript has raised approximately $100-104 million across four funding rounds ^[3]^[4]:

Round	Date	Amount	Lead Investor	Key Participants
Seed/Early	December 2017	~$5 million	Andreessen Horowitz	Initial backing
Series A	September 2019	$15 million	Andreessen Horowitz, Redpoint Ventures	Coincided with Lyrebird acquisition
Series B	January 2021	$30 million	Spark Capital	Andreessen Horowitz, Redpoint Ventures
Series C	November 2022	$50 million	OpenAI Startup Fund	Andreessen Horowitz, Redpoint, Spark Capital, Daniel Gross

The Series C round valued Descript at approximately $550 million ^[6]. Notable individual investors across rounds include Casey Neistat, Tobi Lutke (Shopify), Shishir Mehrotra (Coda), Lenny Rachitsky, Naval Ravikant, and Rahul Vohra (Superhuman) ^[3]. In total, Descript has 29 investors: 8 institutional and 21 angel investors.

How does Descript compare to competitors?

Descript operates at the intersection of several competitive markets: audio editing, video editing, transcription services, and AI-powered content creation tools.

Direct Competitors (Transcript-Based Editing)

Competitor	Founded	Total Funding	Key Differentiator
Riverside.fm	2019	$47 million	Studio-quality remote recording with text-based editing; strong podcast focus
Reduct	2018	$4 million	Transcript-based video review and collaboration for teams
Podcastle	2020	$8.8 million	AI audio creation platform for podcasters

Video Editing Competitors

Competitor	Category	Key Differentiator
Adobe Premiere Pro	Professional NLE	Industry standard for professional video editing; deep Creative Cloud integration
CapCut	Consumer/Social	Free, mobile-first editor optimized for social media; owned by ByteDance
Runway	AI-native video	Generative AI video creation and editing; text-to-video capabilities
InVideo	AI-assisted	Text-to-video conversion; large template library; raised $52 million
VEED.io	Web-based	Browser-based editor with subtitles, AI voiceovers; raised $35 million

Audio/Podcast Competitors

Competitor	Category	Key Differentiator
Adobe Podcast	AI audio	Web-based AI audio editing powered by Adobe's speech-to-text technology
Audacity	Open source	Free, open-source audio editor; large community but no AI features
Hindenburg	Journalism	Professional audio editor designed specifically for journalists and podcasters

Competitive Positioning

Descript's primary competitive advantage is its unique text-first editing paradigm, which remains unmatched in its depth of implementation. While competitors like Riverside.fm and Reduct offer some transcript-based features, Descript's integration of transcript editing with AI features (Overdub, Studio Sound, Eye Contact, Underlord) creates a more comprehensive platform.

The main competitive risks include commodification of AI features (as transcription, noise removal, and text-to-speech become standard in competing products) and competition from well-funded incumbents like Adobe that can integrate similar AI capabilities into their established editing suites.

Notable Customers

Descript's customer base spans media organizations, technology companies, educational institutions, and individual content creators ^[4]:

Sector	Notable Customers
Media and Journalism	NPR, VICE, The Washington Post, The New York Times
Technology	Shopify, HubSpot, Masterclass
Content Creation	Numerous YouTube channels, TikTok creators, and podcast networks

ELI5: Descript Explained Simply

Imagine you recorded yourself talking, and the computer wrote down everything you said, like the captions on a video. With Descript, if you cross out a word in that written version, the computer also erases that word from the recording, so it sounds like you never said it. You can move sentences around, delete the boring parts, and even type new words and have the app say them in a voice that sounds just like you. It turns editing audio and video into something as easy as fixing a typo in a school essay.

References

"Andrew Mason's Descript snags $15M, acquires Lyrebird to let users type text to create audio in their own voices." TechCrunch, September 18, 2019. ↩
"Descript, Andrew Mason's platform to edit audio by editing text, now lets you edit video, too." TechCrunch, October 21, 2020. ↩
"AI-powered media editing app Descript lands fresh cash from OpenAI." TechCrunch, November 15, 2022. ↩
"Descript Business Breakdown & Founding Story." Contrary Research, May 2023. ↩
"Descript raises $30M to build the next generation of video and audio editing tools." TechCrunch, January 12, 2021. ↩
"OpenAI Leads Financing of Andrew Mason's Descript at $500 Million-Plus Valuation." The Information, 2022. ↩
"Introducing Descript Podcast Studio & Overdub." Andrew Mason, Descript Blog, Medium.
"Underlord: Descript's AI Video Editor, with CEO Andrew Mason." The Cognitive Revolution Podcast.
"Descript with Andrew Mason." Software Engineering Daily, March 13, 2020.
"Descript's New Pricing (September 2025)." Trebble.fm. ↩
"Descript Review 2025: Underlord AI Just Changed Video Editing." AI Tool Analysis. ↩
"Descript Underlord Update: Faster AI Video Editing for Creators In 2026." Let's Compare AI. ↩
"Descript AI Review 2025: Features, Pricing, and Real Results After 100+ Hours of Use." Fritz.ai.
"How Descript hit $28M revenue with a 141 person team in 2023." GetLatka.
"Andrew Mason." Wikipedia. Accessed 2026. ↩
"List of 30 Acquisitions by Spotify (Feb 2026)." Tracxn. ↩
"Descript revenue, funding & growth rate." Sacra. Accessed 2026. ↩
"Descript Season 6: Unleash Our AI Assistant in 2025." Descript Blog, April 1, 2025. ↩
"Underlord's Got a Model Picker (and Claude Sonnet 4.5)." Descript Blog, October 2, 2025. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

4 revisions by 1 contributors · full history

Suggest edit

What links here

AI apps table Brad Lightcap SoundStream Suno Text-to-Speech Models Vizard

Detail	Value
Founded	December 2017
Founder and CEO	Andrew Mason
Headquarters	San Francisco, California
Employees	~188 (as of early 2026)
Total Funding	~$100-104 million
Latest Valuation	~$550 million (November 2022)
Estimated ARR	~$55 million (late 2024)
Status	Private, independent

Descript

What is Descript?

History

Andrew Mason and the Path to Descript

When was Descript founded and how was it funded early on (2017-2019)?

When did Descript add video editing, and the Series B (2020-2021)

Why did OpenAI invest in Descript (Series C, 2022)?

Growth and the Underlord AI Co-Editor (2023-2026)

Has Spotify acquired Descript?

How does Descript work?

Transcript-Based Editing

What is Overdub?

Studio Sound

What is Eye Contact?

Filler Word Removal

Green Screen

What can Underlord do? (AI Co-Editor)

Additional Features

What technology powers Descript?

Speech Recognition and Alignment

Voice Synthesis

Computer Vision

Audio Processing

Large Language Models

Pricing

How much funding has Descript raised?

How does Descript compare to competitors?

Direct Competitors (Transcript-Based Editing)

Video Editing Competitors

Audio/Podcast Competitors

Competitive Positioning

Notable Customers

Company Information

ELI5: Descript Explained Simply

See Also

References

Improve this article

What links here

What links here

What is Descript?

History

Andrew Mason and the Path to Descript

When was Descript founded and how was it funded early on (2017-2019)?

When did Descript add video editing, and the Series B (2020-2021)

Why did OpenAI invest in Descript (Series C, 2022)?

Growth and the Underlord AI Co-Editor (2023-2026)

Has Spotify acquired Descript?

How does Descript work?

Transcript-Based Editing

What is Overdub?

Studio Sound

What is Eye Contact?

Filler Word Removal

Green Screen

What can Underlord do? (AI Co-Editor)

Additional Features

What technology powers Descript?

Speech Recognition and Alignment

Voice Synthesis

Computer Vision

Audio Processing

Large Language Models

Pricing

How much funding has Descript raised?

How does Descript compare to competitors?

Direct Competitors (Transcript-Based Editing)

Video Editing Competitors

Audio/Podcast Competitors

Competitive Positioning

Notable Customers

Company Information

ELI5: Descript Explained Simply

See Also

References

Improve this article

Related Articles

Otter.ai

Fireflies.ai

Music

Murf AI

Krisp AI

Superwhisper

What links here

Related Articles

Otter.ai

Fireflies.ai

Music

Murf AI

Krisp AI

Superwhisper

What links here