Deepgram

AI Companies Natural Language Processing Speech & Audio AI Voice AI

21 min read

Updated Jun 22, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 22, 2026

Fact-checked

In review queue

Sources

15 citations

Revision

v3 · 4,274 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Deepgram is an American voice artificial intelligence company, founded in 2015 and headquartered in San Francisco, that builds proprietary deep learning models for speech recognition, text-to-speech synthesis, and voice agent orchestration, sold through an API rather than a packaged app. It is the maker of the Nova family of speech-to-text models, the Aura family of text-to-speech models, and the Voice Agent API, and in January 2026 it became the first dedicated speech AI company to reach unicorn status, closing a $130 million Series C at a $1.3 billion post-money valuation.^[1]^[2] The company was co-founded by Scott Stephenson, Adam Sypniewski, and Noah Shutty, all of whom had backgrounds in experimental physics at the University of Michigan.^[7]

Deepgram's commercial speech-to-text product line is anchored by the Nova family (Nova-1, Nova-2, and Nova-3), while its text-to-speech offering is branded Aura, with Aura-2 reaching general availability on April 15, 2025.^[9] On June 16, 2025 Deepgram made its Voice Agent API generally available, a unified speech-to-speech interface that combines its own ASR and TTS engines with pluggable large language models.^[8] As of early 2026, Deepgram reports that more than 200,000 developers and over 1,300 organizations build on its platform, with the company having cumulatively processed more than 50,000 years of audio and over one trillion words of transcribed speech.^[1]^[5]

Deepgram occupies a position in the natural language processing and voice AI market that is distinct from both hyperscale cloud providers and pure transcription vendors. Unlike Google, Amazon, and Microsoft, which embed speech APIs into broader cloud platforms, Deepgram is structured as a speech-first model lab that trains end-to-end neural networks specifically for production deployments. Unlike consumer-facing transcription apps, the company has from its earliest days focused on the developer and enterprise channel, exposing its models through a REST and WebSocket API rather than a packaged user interface. Lead Series C investor Elizabeth de Saint-Aignan of AVP framed the company's role this way: "Much like Stripe delivered the API platform underpinning the payments economy, we believe Deepgram is poised to deliver the API platform underpinning the emerging trillion-dollar B2B Voice AI economy."^[1]

What does Deepgram do?

Deepgram is a privately held Delaware corporation operating principally from San Francisco. Scott Stephenson serves as chief executive officer, with co-founder Adam Sypniewski as chief technology officer.^[5] The company's product lineup comprises three primary categories: streaming and pre-recorded speech-to-text APIs (the Nova family), real-time text-to-speech synthesis (the Aura family), and a higher-level Voice Agent API that combines both into a turn-taking conversational interface.^[8] Deepgram also operates a custom model team that fine-tunes its base models for specific verticals, including healthcare, financial services, contact centers, and the quick-service restaurant industry.

Deepgram's commercial model is API-first and consumption-based. Customers are billed per minute of audio processed (for transcription) or per character generated (for synthesis), with discounted rates for committed volume. The company offers cloud-hosted, virtual private cloud, and on-premises deployment modes, with the latter two targeted at regulated industries such as healthcare and finance where audio cannot leave a customer-controlled boundary. Pricing for Nova-3 begins at $0.0077 per minute for pre-recorded audio and $0.0058 per minute for streaming, while Aura-2 starts at $0.030 per 1,000 characters. The Voice Agent API is offered at a flat rate of $4.50 per hour of conversation.^[8]

History

How did Deepgram start?

The story of Deepgram begins not in software but in an underground particle physics laboratory. Scott Stephenson and Noah Shutty were both graduate students in physics at the University of Michigan during the early 2010s, working on the China Dark Matter Experiment, an effort to detect weakly interacting massive particles (WIMPs) using cryogenic germanium detectors in the Jinping Underground Laboratory.^[6] Their experimental task involved sifting through enormous volumes of waveform data to identify rare, faint signatures of particle interactions buried in background noise. The technical problem of finding sparse, subtle patterns in long continuous waveforms turned out to be a close cousin of the problem of finding spoken words in audio recordings.^[6]

In parallel with their academic work, Stephenson and Shutty had been experimenting with wearable recording devices and had accumulated hundreds of hours of personal audio. They wanted to be able to search through this archive but found that the speech recognition tools available at the time, dominated by acoustic-model and language-model pipelines built on Gaussian mixture models and n-gram language models, were not accurate enough for unconstrained conversational audio. Stephenson realized that the deep learning waveform-analysis techniques he had been applying to dark matter signatures could be repurposed for speech.^[6] The bet was that an end-to-end neural network trained directly on raw audio would outperform the assembled pipelines that defined commercial ASR at the time.

The two founders, joined by fellow Michigan physicist Adam Sypniewski as a third co-founder, moved to the San Francisco Bay Area and incorporated the company in 2015. They applied to the Y Combinator accelerator and were accepted into the Winter 2016 batch, which provided initial seed funding and the network that would shape Deepgram's investor base over the following decade.^[7]

Early product and the move to deep learning

Deepgram's first commercial product was a general-purpose speech transcription API built on end-to-end deep neural networks. At the time of launch, the prevailing architectures in commercial ASR still relied on hybrid hidden Markov model and deep neural network systems, with separate pronunciation lexicons and language models. Deepgram took the more aggressive position that a single neural network trained on enough audio could learn the joint distribution of acoustics and language directly, eliminating the need for the older pipeline.^[7]

This approach was technologically defensible but commercially difficult to sell during the company's first several years. Enterprise buyers were accustomed to procurement processes that referenced word error rate benchmarks on clean speech datasets, where the gap between Deepgram and incumbent vendors was modest. The company's real advantage, robustness on noisy real-world audio in domains like call centers, voicemails, and multi-speaker meetings, did not show up clearly in those benchmarks. Stephenson has described the period from 2017 to 2021 as a slow process of finding customers who had measured ASR performance themselves and concluded that the standard benchmarks were inadequate proxies for their workloads.^[6]

Deepgram's early customer wins came from contact center operators, media monitoring firms, and a handful of government and research customers. NASA contracted with Deepgram to build a custom model for transcribing communications between Mission Control and the International Space Station, an unusually demanding audio domain because of bandwidth-limited radio links, technical jargon, and overlapping speakers. The NASA deployment became one of the company's most visible reference customers and helped establish credibility in adjacent regulated industries.^[5]

How much funding has Deepgram raised?

Deepgram has raised capital across seed, Series A, Series B, and Series C rounds, with cumulative funding through the Series C exceeding $215 million. Its funding history is summarized below.

Round	Date	Amount	Lead investor(s)	Notes
Seed	2016	Undisclosed	Y Combinator	W16 batch
Seed extension	2018	$1.8M	Compound (then Metamorphic Ventures)
Series A	March 2020	$12M	Wing VC	Tigris Partners, Y Combinator, SAP.iO also participated
Series B (first tranche)	Feb 2021	~$25M	Tiger Global	Initial close
Series B (extension)	Nov 2022	$47M	Madrona	Brought total Series B to $72M; new investors included Alkeon, BlackRock, In-Q-Tel, Citi Ventures, Nvidia, and SAP.iO
Series C	Jan 2026	$130M	AVP	$1.3B post-money valuation; Twilio, ServiceNow Ventures, SAP, Princeville Capital, Alumni Ventures, University of Michigan, and Columbia University also participated

The Series B in particular was notable for the breadth of strategic investors, with the participation of In-Q-Tel signaling adoption by U.S. intelligence community customers and Nvidia's involvement reflecting the increasingly tight relationship between GPU vendors and speech model labs.^[3]^[4] Citi Ventures and SAP.iO returned to participate in subsequent rounds, foreshadowing the financial services and enterprise software customer expansion that followed.^[15]

The $130 million Series C, announced on January 13, 2026 and led by AVP at a $1.3 billion post-money valuation, made Deepgram the first speech AI specialist to reach unicorn status during the current wave of voice AI investment.^[1]^[2] All major existing investors, including Alkeon, In-Q-Tel, Madrona, Tiger Global, Wing, Y Combinator, and BlackRock-managed funds, joined the round.^[1] Stephenson tied the raise to the company's bet on conversational scale: "As we rapidly approach a world where billions of simultaneous conversations are powered by Voice AI, enterprises and developers need real-time, reliable infrastructure capable of fully duplex, contextual conversations at scale."^[1]

The Series C announcement also coincided with Deepgram's acquisition of OfOne, a Y Combinator-backed startup focused on AI-driven drive-thru ordering for quick-service restaurants that Deepgram says had delivered more than 95% order containment for national QSR brands.^[1]^[2] OfOne's technology became the foundation of a vertical-specific product called Deepgram for Restaurants, joining the existing nova-3-medical model in Deepgram's portfolio of domain-tuned speech systems.^[13]

Technology and products

Deepgram's technology stack is built around three core capabilities: automatic speech recognition, neural text-to-speech, and a voice agent orchestration layer. The company runs its own training infrastructure, develops its own model architectures (which it has not published in full academic detail), and serves models from a unified inference runtime called Deepgram Enterprise Runtime that is optimized for latency-sensitive real-time deployment.

What are the Nova speech-to-text models?

The Nova model line represents the company's flagship product family. Nova models are trained end-to-end on a mixture of supervised audio-text pairs, weakly supervised audio with pseudo-labels, and large quantities of synthetic data generated to cover rare acoustic and linguistic conditions.^[11]

Model	Released	Notable characteristics
Nova (Nova-1)	2022	First Nova generation. Trained on over 100 domains and 47 billion tokens. 22% WER reduction over the prior generation; 23 to 78 times faster than competing services. Starting price $0.0043 per minute.
Nova-2	Nov 2023	18.4% relative WER improvement over Nova-1; 36.4% relative improvement over OpenAI Whisper Large. Improved punctuation by 22.6% and capitalization by 31.4%.
Nova-3	Feb 2025	First model to support live code-switching across 10 languages in a single stream. Median streaming WER of 6.84% and median batch WER of 5.26% across a 2,703-file, 81.69-hour benchmark spanning 9 domains. Introduced self-serve keyterm prompting and a dedicated nova-3-medical variant.

In its February 12, 2025 launch, Deepgram stated that Nova-3 "delivers industry-leading performance with a 54.3% reduction in word error rate (WER) for streaming and 47.4% for batch processing compared to competitors."^[11] On Deepgram's published benchmark of 2,703 audio files (81.69 hours across 9 domains), Nova-3 posted a median streaming WER of 6.84%, against 14.92% for the next-best competitor, and a median batch WER of 5.26%, against roughly 10% for the next-best competitor.^[11]

Each Nova generation has emphasized a different axis of improvement. Nova-1 prioritized cost and throughput, an explicit response to the price-sensitivity of contact center workloads. Nova-2 narrowed the accuracy gap to research-quality models like Whisper while preserving Deepgram's latency advantage. Nova-3 turned attention to the long tail of the audio distribution, building a representation-learning framework that helped the training pipeline detect and target under-represented acoustic conditions in the corpus. Nova-3 also introduced live code-switching for ten supported languages without requiring the caller to indicate which language is being spoken; Deepgram later expanded Nova-3's formally supported language set well beyond the launch ten.^[11]

The Nova line is offered in both streaming and pre-recorded modes. Streaming is delivered over a WebSocket interface and returns incremental transcripts with median end-of-utterance latency under 300 milliseconds. Pre-recorded mode operates as a standard REST API and is typically used for batch transcription of recorded audio files. Both modes share the same underlying model weights, with streaming variants tuned to balance partial-hypothesis stability against responsiveness.

What are the Aura text-to-speech models?

Deepgram entered the text-to-speech market in 2024 with the original Aura model, positioning it as a real-time TTS engine specifically engineered for use inside voice agent loops where latency and consistency matter more than the dramatic prosody of entertainment-oriented voices.^[10] The launch was explicit about this positioning, framing Aura as the missing complementary piece to Nova for builders who wanted to assemble a complete voice stack without crossing vendor boundaries.^[10]

Aura-2 followed on April 15, 2025 and significantly expanded the product. It offers more than 40 professional-grade English voice personas at launch, with Spanish voices added in June 2025, and is engineered for time-to-first-byte (TTFB) latency under 200 milliseconds in streaming mode; Deepgram later reported steady-state TTFB of around 90 milliseconds after re-engineering the runtime for parallelism.^[9] Aura-2 is built on the same Deepgram Enterprise Runtime that serves Nova, allowing the speech-in and speech-out paths to share infrastructure and latency budgets. Deepgram has presented public listening test results in which Aura-2 was preferred by users at approximately 60% rates over competing services from ElevenLabs, Cartesia, and OpenAI in enterprise scenarios such as appointment confirmation, customer support, and order-taking.^[9]

What is the Deepgram Voice Agent API?

The Voice Agent API, announced in preview during 2024 and made generally available on June 16, 2025, is Deepgram's highest-level product.^[12]^[8] It exposes a single WebSocket interface that handles bidirectional audio: the developer streams microphone audio in, and Aura-2 synthesized speech streams back out. Internally, the API combines speech-to-text via Nova-3, turn-taking and barge-in detection, function calling, large language model orchestration, and text-to-speech via Aura-2.^[8]

A distinguishing design choice is that the LLM stage is pluggable. Developers can let Deepgram orchestrate the conversation using a default model selection, or they can configure the API to call out to their own LLM endpoint, including hosted models from third parties. This allows enterprises that have already built around a specific model family to retain that choice while still benefiting from Deepgram's tightly integrated ASR and TTS pipeline. The API also supports mid-session control: prompts, voices, and even model selections can be changed during a single ongoing conversation without tearing down the connection.^[8]

On a benchmark Deepgram published alongside the GA launch, called the Voice Agent Quality Index (VAQI), which measures latency, interruption rate, and response coverage, Deepgram achieved the highest overall score among evaluated providers, outperforming OpenAI's competing real-time voice product by 6.4% and an ElevenLabs Conversational AI configuration by 29.3%.^[8] The product is priced at $4.50 per hour of conversation, which Deepgram positions as a single all-in figure for ASR, TTS, and orchestration.^[8]

Custom and domain-tuned models

In addition to its general-purpose Nova and Aura models, Deepgram operates a custom model practice that fine-tunes base models for specific verticals. The first publicly named vertical variant is nova-3-medical, a fine-tune of Nova-3 on clinical dictation and medical terminology that targets healthcare scribe and ambient clinical documentation use cases.^[11] Following the OfOne acquisition in January 2026, Deepgram for Restaurants packages domain-tuned ASR with operational integrations for drive-thru and quick-service restaurant deployments.^[2] Additional vertical fine-tunes are available under custom contract for financial services, insurance, and air traffic control applications.

What is Deepgram used for?

Deepgram's customer base spans large enterprises, mid-market application companies, and developer-driven startups. Named enterprise customers include Citi (financial services), Twilio (communications platform), Spotify (media), Optum (healthcare), Jack in the Box (quick-service restaurants), Aircall, OpenPhone, Vapi, and Groq.^[5] NASA uses Deepgram for transcribing Mission Control to International Space Station communications, an unusually challenging audio domain because of bandwidth-limited radio links, technical jargon, and overlapping speakers.^[5]

Use cases concentrate in several broad categories:

Contact center analytics and automation. Transcribing live agent calls for quality assurance, real-time agent assist, and post-call analytics. Customers in this category use Deepgram's pre-recorded API for archived calls and streaming API for real-time supervision.
Voice agents and conversational AI. Application builders use Deepgram's Voice Agent API or a combination of Nova-3 and Aura-2 to construct AI agents that handle inbound and outbound voice interactions. This is the company's fastest-growing segment as of 2026.
Ambient clinical documentation. Healthcare technology vendors use nova-3-medical to transcribe doctor-patient encounters for downstream summarization and electronic health record population.
Media and content intelligence. Podcast hosting platforms, video conferencing vendors, and broadcast monitoring firms use Deepgram to generate searchable transcripts and time-aligned captions.
Drive-thru and restaurant automation. Following the OfOne acquisition, Deepgram offers a packaged solution for restaurant chains that wish to deploy AI agents at the drive-thru ordering window. Jack in the Box has publicly committed to AI voice agent rollout in this category.^[2]
Financial services compliance. Banks and brokerages use Deepgram to transcribe trader calls and customer service interactions for compliance archiving and surveillance.

How does Deepgram compare to AssemblyAI, Whisper, and ElevenLabs?

The speech AI market in 2026 contains a small number of model labs that train and operate their own production speech systems, alongside the hyperscale cloud providers that offer commodity ASR APIs as part of broader platforms. Deepgram is most directly compared to AssemblyAI, Speechmatics, and the speech APIs offered by Google, Amazon, and Microsoft on the transcription side, and to ElevenLabs and OpenAI on the synthesis side.

Provider	STT product	TTS product	Voice agent product	Headquarters	Notes
Deepgram	Nova-3	Aura-2	Voice Agent API	San Francisco, USA	API-first, end-to-end neural; both STT and TTS in-house
AssemblyAI	Universal-3	None first-party	LeMUR + partner TTS	San Francisco, USA	STT specialist with audio intelligence overlay
Speechmatics	Ursa	None first-party	None	Cambridge, UK	Multilingual STT specialist, strong on accent coverage
Whisper (OpenAI)	Whisper Large v3	TTS-1 / GPT-4o realtime	Realtime API	San Francisco, USA	Open-weights Whisper plus closed Realtime API
ElevenLabs	Scribe	v3	Conversational AI	London / New York	TTS-led, expanded into STT and voice agents
Google Cloud	Speech-to-Text	Cloud TTS / Chirp HD	Dialogflow CX	Mountain View, USA	Cloud-platform bundled, breadth over specialization
Amazon Web Services	Transcribe	Polly	Lex	Seattle, USA	Cloud-platform bundled
Microsoft Azure	Speech to Text	Neural TTS	Azure AI Speech / Bot Service	Redmond, USA	Cloud-platform bundled, tight Office integration

Deepgram's distinctive position in this landscape rests on several attributes. It is one of only a few independent vendors that train and operate both first-party STT and first-party TTS at production scale, in contrast to AssemblyAI and Speechmatics (STT-only) and ElevenLabs (TTS-led with derivative STT). It pursues a price-per-minute model that targets the high-volume contact center and developer market, undercutting the per-minute rates of the hyperscale cloud providers while preserving margin through model efficiency. Its emphasis on streaming latency, with sub-300 millisecond end-of-utterance latencies on the STT side and sub-200 millisecond time-to-first-byte on the TTS side, is engineered specifically for the conversational agent use case rather than for offline batch transcription.^[9]

A second axis of comparison is depth of accent and language coverage. Speechmatics has historically led on accented English and on languages outside the top ten by speaker volume, drawing on its long heritage as a UK-based speech research organization. Deepgram's coverage has expanded substantially with Nova-3, which supports live code-switching across English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch in a single audio stream.^[11] For workloads dominated by call center English with occasional Spanish, Deepgram's code-switching capability is often the more usable feature; for workloads in less-resourced languages, Speechmatics retains an edge.

The comparison with Whisper is structurally different because Whisper is distributed as open weights. Many organizations self-host Whisper Large v3 for cost or data residency reasons. Deepgram's competitive argument against self-hosted Whisper is that, once one accounts for the cost of GPU infrastructure, inference engineering, model maintenance, and the latency penalty of running an offline-oriented model in a streaming setting, the all-in cost of a managed API is often lower. Nova-3 has been benchmarked at up to 36% lower word error rate than Whisper Large v3 on select datasets according to Deepgram's testing, although direct comparisons depend heavily on the evaluation corpus.^[11]

Technology approach and research culture

Deepgram has not published its model architectures in the level of detail that would be expected of an academic research lab. Its public communications emphasize empirical performance on the company's own production benchmarks rather than novel architectural contributions. This reflects a deliberate positioning: the company sells access to deployed models, not papers, and its competitive moat is built around proprietary training data, custom data augmentation pipelines, and inference engineering rather than around a single architectural innovation.

That said, several aspects of Deepgram's approach are publicly known. The company trains end-to-end on raw audio, in contrast to the older hybrid HMM-DNN pipelines that dominated commercial ASR through the mid-2010s. It uses large quantities of weakly labeled and synthetic data alongside high-quality labeled corpora, an approach that resembles the self-training and pseudo-labeling techniques described in academic ASR literature. For Nova-3, the company introduced a representation-learning framework that compresses audio into a latent embedding space and uses that representation to identify under-represented acoustic conditions, allowing the training pipeline to specifically target the long tail of audio difficulty.^[11]

Deepgram's inference engineering is similarly proprietary but has been described publicly in terms of throughput targets. The company claims that its models are 23 to 78 times faster than competing services on per-GPU throughput, and that the Voice Agent API achieves end-to-end speech-to-speech latencies low enough to support natural turn-taking in human-AI conversation.^[8] The Enterprise Runtime that serves the models is designed to run identically in Deepgram's cloud, in customer-managed virtual private clouds, and in on-premises deployments, which is a meaningful operational distinction in regulated industries.

Reception and limitations

Industry coverage has generally framed Deepgram as one of the strongest independent specialists in the speech AI segment. TechCrunch, Built In, and SiliconANGLE have characterized the company as a leading challenger to hyperscale cloud speech APIs, particularly in the contact center and voice agent verticals.^[2]^[14]^[12] Coverage of the Series C round in January 2026 highlighted three themes: the rapid growth of voice agent workloads, Deepgram's positioning as a fully vertical speech stack rather than a transcription-only vendor, and the strategic implications of the OfOne acquisition as a template for vertical packaging.^[2]^[13]

Some critique from the broader speech AI community has focused on the company's relatively limited public research output, which contrasts with the more open posture of organizations like OpenAI (which released Whisper as open weights) and Speechmatics (which has historically published research papers). A second area of critique concerns vendor-run benchmarks: published comparisons between Nova models and competing services use Deepgram-selected audio datasets, and the magnitude of accuracy gaps in those comparisons is generally larger than what independent evaluations show. As with most large-scale ASR systems, Nova models also exhibit accuracy variation across speaker accents, with the strongest performance on majority American and British English and progressively weaker performance on under-represented accents.

Future direction

Following the Series C and the OfOne acquisition, Deepgram has signaled three near-term strategic priorities. The first is continued investment in the Voice Agent API as the company's highest-margin and fastest-growing product, including expansion of the supported language set and deeper integration with major LLM providers. The second is vertical packaging on the model of Deepgram for Restaurants, with healthcare, financial services, and contact center bundles as likely candidates for similar treatment. The third is geographic expansion, with the Series C capital intended in part to fund hiring outside North America and to support European data residency requirements for regulated customers.^[1]

The company has also signaled continued model investment, with successor generations to Nova-3 and Aura-2 already in development. Public statements have emphasized improvements in non-English language coverage, lower latency for streaming TTS, and tighter integration of the speech and language stages of the voice agent loop, including exploration of more deeply integrated speech-to-speech models that would shorten the path from input audio to output audio inside the runtime.

References

Deepgram. (2026, January 13). "Deepgram Raises $130M Series C at $1.3B Valuation to Power the Voice AI Economy." https://deepgram.com/learn/press-release-deepgram-raises-series-c ↩
TechCrunch. (2026, January 13). "Deepgram raises $130M at $1.3B valuation and buys a YC AI startup." https://techcrunch.com/2026/01/13/deepgram-raises-130m-at-1-3b-valuation-and-buys-a-yc-ai-startup/ ↩
Deepgram. (2022, November 29). "Deepgram Completes $72M Series B Round to Define the Future of Speech Understanding." https://deepgram.com/learn/deepgram-72-million-series-b-defines-future-of-AI-speech-understanding ↩
Voicebot.ai. (2022, November 29). "Speech Recognition AI Startup Deepgram Closes $72M Funding Round." https://voicebot.ai/2022/11/29/speech-recognition-ai-startup-deepgram-closes-72m-funding-round/ ↩
Deepgram. "About Us | Voice AI | STT & TTS." https://deepgram.com/about ↩
Madrona Venture Group. "Deepgram Founder Shares Strategies for Scaling and Outmaneuvering Big Tech." https://www.madrona.com/founded-funded-deepgram-scott-stephenson/ ↩
Y Combinator. "Deepgram: Building foundational AI for speech transcription and understanding." https://www.ycombinator.com/companies/deepgram ↩
Deepgram. (2025, June 16). "Deepgram Launches Voice Agent API for Real-Time, Enterprise-Ready Conversational AI." https://deepgram.com/learn/deepgram-launches-voice-agent-api ↩
Deepgram. (2025, April 15). "Deepgram Unveils Aura-2: The World's Most Professional, Cost-Effective, and Enterprise-Grade Text-to-Speech Model." https://www.businesswire.com/news/home/20250415446781/en/ ↩
Deepgram. (2024). "Introducing Deepgram Aura: Lightning Fast Text-to-Speech for Voice AI Agents." https://deepgram.com/learn/aura-text-to-speech-tts-api-voice-ai-agents-launch ↩
Deepgram. (2025, February 12). "Introducing Nova-3: Setting a New Standard for AI-Driven Speech-to-Text." https://deepgram.com/learn/introducing-nova-3-speech-to-text-api ↩
SiliconANGLE. (2024, September 19). "Exclusive: Deepgram launches voice agent API that brings AI conversations to life." https://siliconangle.com/2024/09/19/exclusive-deepgram-launches-voice-agent-api-brings-ai-conversations-life/ ↩
Inc. (2026). "Deepgram Wasn't Looking for Capital. Then Came $130 Million." https://www.inc.com/chloe-aiello/ai-founder-deepgram-capital-investors-130-million-series-c-y-combinator/91287424 ↩
Built In San Francisco. (2026, January 15). "Deepgram Raises $130M Series C Round at $1.3B Valuation." https://www.builtinsf.com/articles/deepgram-raises-130m-series-c-1b-valuation-20260115 ↩
PR Newswire. (2022, November 29). "Deepgram Raises $47M to Define the Future of AI Speech Understanding." https://www.prnewswire.com/news-releases/deepgram-raises-47m-to-define-the-future-of-AI-speech-understanding-301688484.html ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

AI Voice Agent AssemblyAI Best AI Voice Generators (Text-to-Speech)Bland AI Cartesia Deepgram Nova-3 Gladia Lindy LiveKit Agents Pipecat Speech recognition Speechmatics Superwhisper Vapi Voice AI Whisper Y Combinator

What does Deepgram do?

History

How did Deepgram start?

Early product and the move to deep learning

How much funding has Deepgram raised?

Technology and products

What are the Nova speech-to-text models?

What are the Aura text-to-speech models?

What is the Deepgram Voice Agent API?

Custom and domain-tuned models

What is Deepgram used for?

How does Deepgram compare to AssemblyAI, Whisper, and ElevenLabs?

Technology approach and research culture

Reception and limitations

Future direction

See also

References

Improve this article

Related Articles

ElevenLabs

Sesame (AI company)

Hume AI

Cartesia

AssemblyAI

Inworld AI

What links here

Related Articles

ElevenLabs

Sesame (AI company)

Hume AI

Cartesia

AssemblyAI

Inworld AI

What links here