Descript is an artificial intelligence-powered audio and video editing platform that allows users to edit media by editing text. Founded in December 2017 by Andrew Mason, the former CEO of Groupon, Descript is headquartered in San Francisco, California. The platform's core innovation is transcript-based editing: when a user deletes, rearranges, or modifies words in the transcript, the corresponding audio and video are automatically adjusted to match. Descript has raised approximately $104 million in total funding, with its Series C round led by the OpenAI Startup Fund at a valuation of approximately $550 million. The company's notable features include Overdub (AI voice cloning), Studio Sound (audio enhancement), Eye Contact (gaze correction), filler word removal, and Underlord (AI co-editor). As of late 2024, Descript had reached approximately $55 million in annual recurring revenue. Descript's customers include NPR, The Washington Post, The New York Times, Shopify, and HubSpot.
Andrew Mason is a serial entrepreneur whose career has been defined by a pattern of identifying friction in everyday experiences and building technology to eliminate it. Mason graduated from Northwestern University and first gained prominence as the founder and CEO of Groupon, the daily deals platform that launched in 2008 and became one of the fastest-growing companies in internet history. Groupon's IPO in November 2011 valued the company at $12.7 billion, making it one of the largest internet IPOs since Google. Mason served as CEO until February 2013, when he was removed by the board amid declining stock prices and operational challenges. In a characteristically candid departure memo, Mason acknowledged being fired and wished the company well.
After leaving Groupon, Mason co-founded Detour in 2014, an augmented reality startup that created location-based audio walking tours. Users would put on headphones and walk through cities while Detour played narrated, GPS-triggered audio experiences that blended storytelling with the physical environment. The app attracted critical acclaim for its innovative approach to urban exploration.
It was at Detour that the seed of Descript was planted. To produce Detour's audio tours, the team needed to record, edit, and produce large volumes of spoken-word audio content. Mason quickly discovered that the audio editing tools available at the time (primarily designed for music production) were overly complex and poorly suited for speech editing. Manipulating audio waveforms to cut, rearrange, or fix spoken content was tedious and required technical expertise that most content creators lacked.
Mason's insight was simple but powerful: since the audio content was fundamentally speech (words spoken in sequence), it should be possible to edit the audio by editing a text transcript instead. Delete a sentence from the transcript, and the corresponding audio disappears. Rearrange paragraphs in the transcript, and the audio follows suit. This text-based approach would make audio editing as intuitive as editing a document in a word processor.
The Detour team spent roughly two and a half years developing this transcript-based audio editing technology as an internal tool. The technology relied on two advancing fields: automated speech recognition (to generate accurate transcripts) and text-audio alignment (to maintain synchronization between transcript edits and audio playback).
In 2017, Bose acquired Detour as part of its augmented reality audio platform initiative. The Detour app was eventually discontinued, but the internal audio editing tool that Mason's team had built was too promising to abandon. Mason spun it off as a separate company, formally founding Descript in December 2017.
Descript's potential was recognized early by leading venture capital firms. The company's initial funding came from Andreessen Horowitz, which participated in the company's earliest investment round in December 2017.
In September 2019, Descript raised a $15 million Series A round led by Andreessen Horowitz and Redpoint Ventures. Alongside this funding, Descript made a strategically significant acquisition: it purchased Lyrebird, a Montreal-based AI startup that had developed advanced voice cloning technology. Lyrebird's technology could analyze a sample of a person's voice and generate synthetic speech that closely mimicked the original speaker's voice, intonation, and cadence.
The Lyrebird acquisition gave Descript the foundation for what would become its Overdub feature, one of the platform's most distinctive capabilities. With Overdub, users could type new words into their transcript and have those words spoken in a synthetic clone of their own voice, seamlessly blending with the original recording. This meant that if a podcaster misspoke or wanted to add a correction, they could simply type the correct words and Overdub would generate the audio, eliminating the need for re-recording.
In October 2020, Descript expanded from audio-only editing to include video editing. The same transcript-based editing paradigm applied: users could edit video content by editing the text transcript. Deleting words from the transcript removed the corresponding video footage. The video launch included tools for screen recording, titles, transitions, image overlays, and multi-track editing, all controlled through the familiar text-editing interface.
This expansion was particularly timely given the COVID-19 pandemic's acceleration of remote work, online education, and video content creation. The demand for accessible video editing tools was growing rapidly as millions of new content creators entered the market.
In January 2021, Descript raised a $30 million Series B round led by Spark Capital's Nabeel Hyatt, with continued participation from Andreessen Horowitz and Redpoint Ventures. The round valued the company at approximately $260 million. The funding was used to expand the engineering team and accelerate product development.
In November 2022, Descript raised a $50 million Series C round led by the OpenAI Startup Fund. The round also included participation from Andreessen Horowitz, Redpoint Ventures, Spark Capital, and prominent individual investors such as Daniel Gross (former Y Combinator partner), Casey Neistat (YouTube creator), Tobi Lutke (Shopify founder), Shishir Mehrotra (Coda CEO), Lenny Rachitsky (product advisor and writer), Naval Ravikant (AngelList founder), and Rahul Vohra (Superhuman founder).
The Series C valued Descript at approximately $550 million, more than double its valuation from just 21 months prior. The strategic significance of OpenAI's investment extended beyond capital: it positioned Descript to deeply integrate OpenAI's large language models and generative AI capabilities into its editing platform.
OpenAI's Sam Altman commented on the investment, noting that Descript's approach to media editing through text-based interfaces represented a natural application of generative AI technology.
Throughout 2023, Descript continued to expand its AI capabilities under the brand name Underlord, which serves as the platform's AI co-editor. Underlord integrates various AI features into a unified assistant that can execute multi-step editing workflows. Instead of manually applying individual AI tools, users can instruct Underlord with natural language commands like "polish this podcast episode for publishing," and the system will execute a sequence of editing operations: removing filler words, applying Studio Sound noise reduction, adding captions, cutting dead air, adjusting pacing, and suggesting social media clip candidates.
In August 2025, Descript launched a major update to Underlord that enabled it to execute 15 to 20 editing steps in sequence and make judgment calls about pacing and content selection. This represented a shift toward agentic AI workflows in creative software.
Descript also introduced a new pricing model in September 2025, transitioning from simple transcription-hour limits to a more granular system based on Media Minutes (for uploads and recordings) and AI Credits (for AI-powered features like Studio Sound, Eye Contact, Green Screen, and Overdub).
By late 2024, Descript had reached approximately $55 million in annual recurring revenue, representing 75% year-over-year growth. The company employed roughly 188 people as of early 2026.
Despite some speculation, there is no confirmed acquisition of Descript by Spotify as of March 2026. Spotify has made numerous acquisitions in the podcasting space, including Gimlet Media, Anchor, The Ringer, Megaphone, and WhoSampled (acquired in November 2025). However, Descript does not appear among Spotify's acquisitions, and no credible reports confirm such a deal. Descript remains a private, independent company.
Descript's foundational innovation is the ability to edit audio and video by editing a text transcript. The workflow operates as follows:
This approach dramatically lowers the barrier to entry for media editing, making it accessible to people with no experience in traditional audio/video editing software. The text-first interface means that users who are comfortable with word processors can edit podcasts and videos.
Overdub is Descript's AI voice cloning feature, built on technology acquired from Lyrebird in 2019. Overdub allows users to:
Overdub uses approximately 5 AI credits per minute of generated audio. The feature requires explicit user consent and voice verification to prevent misuse, addressing ethical concerns around voice cloning and deepfake technology.
Studio Sound is an AI-powered audio enhancement feature that processes recordings to achieve professional studio-quality audio from any recording environment. It performs several functions:
| Function | Description |
|---|---|
| Background Noise Removal | Eliminates ambient noise (traffic, HVAC, keyboard clicks, etc.) |
| Speech Enhancement | Boosts vocal clarity and presence |
| Echo Cancellation | Removes room reverb and echo artifacts |
| Level Normalization | Balances audio levels across speakers and segments |
Studio Sound uses 10 AI credits per application and is particularly valuable for podcasters and video creators who record in non-studio environments.
Eye Contact is an AI feature that adjusts the gaze direction of a speaker in video to simulate direct eye contact with the camera. This addresses a common problem in video production: when speakers read from scripts, teleprompters, or notes positioned away from the camera lens, their gaze appears off-center. Eye Contact uses computer vision to detect the speaker's eyes and subtly adjust them so they appear to be looking directly at the viewer.
The feature uses 10 AI credits per application and is available on Creator plans and above.
Descript's filler word removal feature automatically detects and removes verbal fillers such as "um," "uh," "like," "you know," "sort of," and similar speech disfluencies. The system identifies these fillers in the transcript and allows users to remove them all at once with a single click, along with the corresponding audio. Users can also selectively remove individual filler words. This feature is available for free and does not require AI credits.
Green Screen is an AI-powered background removal tool that detects the speaker in a video frame and removes the background without requiring a physical green screen. Users can replace the removed background with custom images, videos, or solid colors, or leave it transparent for compositing. The feature uses 15 AI credits per application.
Underlord is Descript's AI assistant that functions as an intelligent co-editor. Rather than requiring users to manually apply individual AI features, Underlord can execute complex, multi-step editing workflows from natural language instructions. Capabilities include:
| Capability | Description |
|---|---|
| Multi-Step Workflows | Executes 15-20 editing steps in sequence from a single instruction |
| Pacing Adjustments | Makes judgment calls about timing, pauses, and rhythm |
| Content Suggestions | Identifies moments suitable for social media clips |
| Polish Workflows | Applies noise removal, filler word removal, captions, and dead air cuts in one pass |
| Show Notes | Generates descriptions, titles, and summaries for podcast episodes |
| Chapter Markers | Automatically creates chapter markers from content structure |
| Feature | Description |
|---|---|
| Screen Recording | High-resolution screen capture with customizable webcam placement |
| Storyboard | Scene-based video editing using the "/" key to split content |
| Remote Recording | Integration with Zoom, Google Meet, and other conferencing platforms |
| Multi-Track Editing | Simultaneous camera and screen recording with dynamic speaker labels |
| Collaboration | Real-time multi-user editing, brand templates, and cloud sync |
| Stock Media Library | Access to stock video, images, and music within the editor |
| Captions and Subtitles | AI-generated captions with customizable styling |
| Templates | Pre-built project templates for common content formats |
| Publishing | Direct publishing to YouTube, podcast platforms, and social media |
Descript's technology stack spans several areas of artificial intelligence:
The platform uses advanced speech recognition models to generate transcripts from audio and video content, supporting 22 languages. Critically, the system maintains precise alignment between the text transcript and the underlying media, enabling the real-time synchronization that makes transcript-based editing possible. Human-assisted transcription is also available at $2 per minute for situations requiring higher accuracy.
The Overdub feature relies on text-to-speech voice synthesis technology originally developed by Lyrebird and refined by Descript's AI team. The system analyzes voice samples to create personalized voice models that capture the speaker's pitch, tone, cadence, and speaking style.
Features like Eye Contact and Green Screen utilize computer vision models for face detection, gaze estimation, and foreground-background segmentation. These models run inference on uploaded video to produce pixel-level adjustments.
Studio Sound uses neural network-based audio processing for noise reduction, echo cancellation, and speech enhancement. The models are trained to distinguish between desired speech content and unwanted background noise across a wide variety of recording environments.
Descript integrates large language models (including models from its investor OpenAI) for natural language understanding in Underlord, content summarization, show notes generation, and the AI Chat features within the platform.
As of 2026, Descript offers five pricing tiers based on a combination of Media Minutes and AI Credits:
| Plan | Monthly Cost | Annual Cost (per month) | Media Minutes | AI Credits | Key Features |
|---|---|---|---|---|---|
| Free | $0 | $0 | 60/month | 100 (one-time grant) | Basic editing, watermarked exports |
| Hobbyist | $16/month | $12/month | 600/month | 200/month | No watermark, higher resolution exports |
| Creator | $24/month | $15/month | 1,800/month | 800/month | Full AI features, Overdub, Eye Contact |
| Business | $40/month | $26/month | 6,000/month | 2,400/month | Team features, brand kits, priority support |
| Enterprise | Custom | Custom | Custom | Custom | SSO, dedicated support, custom onboarding, voice cloning agreements |
All plans include unlimited "basic" seats. Annual billing provides savings of up to 35% compared to monthly billing. AI credit costs for individual features include: Studio Sound (10 credits), Eye Contact (10 credits), Green Screen (15 credits), and Overdub (~5 credits per minute).
Descript has raised approximately $104 million across four funding rounds:
| Round | Date | Amount | Lead Investor | Key Participants |
|---|---|---|---|---|
| Seed/Early | December 2017 | ~$5 million | Andreessen Horowitz | Initial backing |
| Series A | September 2019 | $15 million | Andreessen Horowitz, Redpoint Ventures | Coincided with Lyrebird acquisition |
| Series B | January 2021 | $30 million | Spark Capital | Andreessen Horowitz, Redpoint Ventures |
| Series C | November 2022 | $50 million | OpenAI Startup Fund | Andreessen Horowitz, Redpoint, Spark Capital, Daniel Gross |
The Series C round valued Descript at approximately $550 million. Notable individual investors across rounds include Casey Neistat, Tobi Lutke (Shopify), Shishir Mehrotra (Coda), Lenny Rachitsky, Naval Ravikant, and Rahul Vohra (Superhuman). In total, Descript has 29 investors: 8 institutional and 21 angel investors.
Descript operates at the intersection of several competitive markets: audio editing, video editing, transcription services, and AI-powered content creation tools.
| Competitor | Founded | Total Funding | Key Differentiator |
|---|---|---|---|
| Riverside.fm | 2019 | $47 million | Studio-quality remote recording with text-based editing; strong podcast focus |
| Reduct | 2018 | $4 million | Transcript-based video review and collaboration for teams |
| Podcastle | 2020 | $8.8 million | AI audio creation platform for podcasters |
| Competitor | Category | Key Differentiator |
|---|---|---|
| Adobe Premiere Pro | Professional NLE | Industry standard for professional video editing; deep Creative Cloud integration |
| CapCut | Consumer/Social | Free, mobile-first editor optimized for social media; owned by ByteDance |
| Runway | AI-native video | Generative AI video creation and editing; text-to-video capabilities |
| InVideo | AI-assisted | Text-to-video conversion; large template library; raised $52 million |
| VEED.io | Web-based | Browser-based editor with subtitles, AI voiceovers; raised $35 million |
| Competitor | Category | Key Differentiator |
|---|---|---|
| Adobe Podcast | AI audio | Web-based AI audio editing powered by Adobe's speech-to-text technology |
| Audacity | Open source | Free, open-source audio editor; large community but no AI features |
| Hindenburg | Journalism | Professional audio editor designed specifically for journalists and podcasters |
Descript's primary competitive advantage is its unique text-first editing paradigm, which remains unmatched in its depth of implementation. While competitors like Riverside.fm and Reduct offer some transcript-based features, Descript's integration of transcript editing with AI features (Overdub, Studio Sound, Eye Contact, Underlord) creates a more comprehensive platform.
The main competitive risks include commodification of AI features (as transcription, noise removal, and text-to-speech become standard in competing products) and competition from well-funded incumbents like Adobe that can integrate similar AI capabilities into their established editing suites.
Descript's customer base spans media organizations, technology companies, educational institutions, and individual content creators:
| Sector | Notable Customers |
|---|---|
| Media and Journalism | NPR, VICE, The Washington Post, The New York Times |
| Technology | Shopify, HubSpot, Masterclass |
| Content Creation | Numerous YouTube channels, TikTok creators, and podcast networks |
| Detail | Value |
|---|---|
| Founded | December 2017 |
| Founder and CEO | Andrew Mason |
| Headquarters | San Francisco, California |
| Employees | ~188 (as of early 2026) |
| Total Funding | ~$104 million |
| Latest Valuation | ~$550 million (November 2022) |
| Status | Private, independent |