Vapi

AI Agents AI Companies Conversational AI Developer Tools Voice AI

27 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

14 citations

Revision

v6 · 5,315 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Vapi is a voice AI orchestration platform that lets software developers build, deploy, and scale AI phone agents through a programmable API. It works as a middleware layer that wires together a speech-to-text (STT) transcriber, a large language model (LLM), and a text-to-speech (TTS) voice engine, then handles the real-time turn detection, interruption management, and telephony routing that would otherwise take months of engineering work.^[4] Founded in 2023 by Jordan Dearsley and Nikhil Gupta out of a pivot of their earlier Y Combinator-backed startup Superpowered, Vapi had processed more than 1 billion calls and signed up over 1 million developers by May 2026, when it raised a $50 million Series B led by Peak XV Partners at a reported valuation of roughly $500 million.^[2]^[13]^[14] In a closely watched competitive bake-off, Amazon's Ring selected Vapi over more than 40 rival voice-AI vendors and now routes 100% of its inbound support calls through the platform.^[14]

Vapi operates on a usage-based model, charging $0.05 per minute as an orchestration fee on top of provider costs passed through at cost.^[12] The company has been described as a "Twilio for AI agents" in investor materials, reflecting its position as programmable infrastructure that developers compose into applications rather than a finished end-user product. Vapi's closest analogy is middleware: it does not own the underlying AI models or voice synthesis technology, but provides the runtime that connects them, manages the real-time audio pipeline, and abstracts away the latency-sensitive engineering challenges that make voice AI hard to build reliably.

What is Vapi used for?

Vapi is used to build production conversational AI agents that talk to people over the phone or in the browser. Developers point the platform at the providers they want for each stage of the pipeline, write a prompt and a set of tools for the agent, and Vapi runs the live conversation: it captures speech, transcribes it with a speech recognition engine, feeds the text to an LLM, synthesizes the reply with a TTS voice, and manages the conversational timing so the exchange feels human. Common applications include inbound customer support, outbound sales and lead qualification, appointment booking and reminders, and after-hours call handling. By May 2026 the platform was processing on the order of 1 to 5 million calls per day, and developers had created more than 2.7 million distinct voice agents on it.^[13]^[14]

History

Founders

Jordan Dearsley grew up in Canada and studied at the University of Waterloo before working in product and engineering roles at various technology companies.^[14] His co-founder, Nikhil Gupta, a Waterloo classmate, focused on infrastructure engineering.^[14] The two founded Superpowered, an AI-powered meeting notetaker that captured notes from live audio without requiring a recording bot to join the call.^[2]

Dearsley has described his background as coming from product rather than pure infrastructure engineering, which shaped Vapi's positioning.^[8] He wanted a platform that removed the infrastructure complexity from voice AI development without requiring developers to become specialists in real-time audio systems. That philosophy is reflected in the company's tagline: "voice AI for developers."^[11]

Superpowered entered Y Combinator's Winter 2021 batch.^[2] The product gained traction and reached roughly 10,000 weekly active users and around $500,000 in annual revenue by mid-2023.^[2] Despite this early success, Dearsley and Gupta concluded that the meeting productivity market had too many well-resourced competitors and that differentiation would be hard to sustain.^[2]

The team went through several product experiments before landing on voice AI infrastructure. One experiment was a therapy chatbot called Harmon that used voice interaction. While building it, the founders ran directly into the core technical problem that would define their next company: assembling and running the real-time audio pipeline for a voice agent was extraordinarily difficult. Latency between components, noise filtering, turn-taking logic, and telephony integration each represented weeks of engineering work. Dearsley later described this experience as the direct motivation for Vapi. "Infrastructure itself is not something that people should spend time on," he said in a 2025 interview. "I don't think anyone that's not running real-time audio systems should be running real-time audio systems."^[8]

Founding of Vapi

In August and September 2023, Dearsley and Gupta launched Vapi as a voice API platform. The company formally announced the pivot from Superpowered in November 2023, at which point it had already shipped an early version of the product.^[2] The seed round of $2.1 million, raised alongside the pivot, included Kleiner Perkins and Abstract Ventures as investors.^[2]

At launch, Vapi's main value proposition was latency. At the time, most developers assembling voice agents from raw components were seeing round-trip latency of 1.5 to 2.5 seconds between when a user finished speaking and when the agent responded. Vapi's early architecture brought this closer to 1 second through streaming optimizations and a custom turn-detection model. Dearsley cited latency as the primary early differentiator: "that latency piece was really our differentiator at the time and the reason that people would use us rather than roll it themselves."^[8]

Vapi is organized as a YC W21 company in Y Combinator's records, reflecting the original Superpowered batch.^[11] The company is headquartered in San Francisco.^[11]

Funding

Vapi has raised approximately $72 million in total external funding across three disclosed rounds.^[14]

The seed round of $2.1 million closed in late 2023, shortly after the pivot from Superpowered. Investors included Kleiner Perkins, Abstract Ventures, and Y Combinator.^[2]

When did Vapi raise its Series A?

On December 12, 2024, Vapi announced a $20 million Series A led by Bessemer Venture Partners.^[1] Additional participants included Abstract Ventures (returning), AI Grant, Y Combinator, Saga Ventures, and investor Michael Ovitz.^[1] The round valued the company at approximately $130 million post-money.^[1]

In its announcement, Bessemer described Vapi as providing "a 10x improvement on the development experience for voice agents" and cited the company's speed of execution, including shipping feature requests from a Friday discussion to production by the following Saturday.^[3] The firm positioned its investment thesis around the observation that telephony remains the dominant communication channel for high-stakes, time-sensitive interactions in healthcare, insurance, legal services, and logistics, and that voice AI infrastructure was entering what Bessemer called a "Cambrian explosion."^[3]

Vapi stated it would use the Series A proceeds to expand its engineering team, scale infrastructure, and deepen enterprise sales.^[1]

How much has Vapi raised in its Series B?

On May 12, 2026, Vapi announced a $50 million Series B led by Peak XV Partners (the firm formerly known as Sequoia Capital India and Southeast Asia).^[13] New and returning participants included Microsoft's venture fund M12, Kleiner Perkins, and Bessemer Venture Partners.^[13]^[14] TechCrunch reported the round valued Vapi at roughly $500 million, up from the $130 million Series A valuation seventeen months earlier.^[14] At the time of the raise, Vapi said it had processed more than 1 billion calls, that more than 1 million developers had used its self-serve platform, and that over 2.7 million voice agents had been created on it.^[13] The company reported that enterprise annual recurring revenue had grown roughly 10x over the period, reaching what it characterized as a "healthy eight figures," and that it employed around 100 people.^[13]^[14]

Vapi co-founder and CEO Jordan Dearsley framed the company's goal around making agents feel human: "The real unlock is building agents for your customers that feel human. Vapi gives teams the platform to deploy voice agents that actually solve problems for customers, millions of them, every day."^[13] In a separate interview he described the core engineering challenge as "taking this indeterminate beast that is a model and taming it."^[14]

Independent estimates published in 2025 had pegged Vapi's annual recurring revenue at approximately $4.5 million in 2024 and $8 million in 2025, reflecting rapid growth from a near-zero baseline at its November 2023 public launch.^[12] By December 2024, over 100,000 developers had signed up for the platform.^[1] These early figures came from third-party revenue intelligence services rather than official company disclosures; the company's own May 2026 figures (1 billion calls, 1 million developers, 2.7 million agents) were disclosed as part of the Series B announcement.^[13]

Why did Amazon Ring choose Vapi?

The Series B coincided with the disclosure of Vapi's largest publicly named customer: Amazon's Ring home-security business. According to TechCrunch, Ring turned to Vapi in the fourth quarter of 2025 as it weighed whether to expand human call centers, lean more heavily on traditional interactive voice response (IVR) systems, or deploy AI agents to handle surging holiday-season call volume.^[14] Ring evaluated more than 40 voice-AI vendors in a competitive bake-off and selected Vapi, going from zero to production in roughly two weeks; the company now routes 100% of its inbound support calls through the platform.^[13]^[14]

Jason Mitura, Vice President of Software Development at Amazon Ring, said: "When Ring customers call in, they expect fast, high-quality support. After evaluating dozens of vendors, Vapi stood out."^[13] In TechCrunch's account, Mitura added that "a lot of AI tools promise great outcomes. Vapi has delivered on them."^[14]

Architecture

Vapi's architecture has three primary layers: the core pipeline of STT, LLM, and TTS; the orchestration models that run in parallel with that pipeline; and the telephony or WebRTC transport layer that handles the audio connection.^[4]

Building a voice agent without an orchestration layer requires solving several difficult engineering problems simultaneously. The speech-to-text model must stream audio in real time and produce low-latency transcriptions. The language model must receive incremental transcriptions, generate a response, and stream that response at a token level. The text-to-speech engine must begin synthesizing audio before the full response is available. Every component must communicate over a shared timing bus with round-trip budgets measured in tens of milliseconds. On top of that, the system must handle noisy audio, detect when the caller wants to interrupt, manage turn-taking conventions that differ from text conversation, and route calls over a telephone or WebRTC network. Vapi packages all of this as a managed service accessed through a REST API and a real-time events system.^[4]

How does the Vapi pipeline work?

Every Vapi call runs through three configurable model slots:^[4]

The transcriber receives the raw audio stream from the caller and converts it to text. Supported providers include Deepgram, AssemblyAI, Gladia, and Speechmatics, as well as OpenAI Whisper.
The model receives the transcript and generates a response. Supported providers include OpenAI (GPT-4o, GPT-4o mini), Anthropic (Claude models), Google (Gemini models), and Groq for inference acceleration. Developers can also point Vapi at their own custom LLM endpoint.
The voice synthesizes the model's text response into audio. Supported providers include ElevenLabs, Cartesia, Deepgram Aura, PlayHT, LMNT, and Azure Neural TTS.

All three slots are swappable per-call through the API. A developer can run Deepgram for transcription, Groq-hosted Llama for the language model, and Cartesia for TTS in one call, then switch any component on the next call. Vapi also supports bring-your-own API keys across all three layers, which means provider costs pass through directly and the developer maintains their own billing relationship with each provider.

Orchestration layer

On top of the core pipeline, Vapi runs a suite of proprietary real-time models that the company groups under the name "orchestration layer." These models are what justify the $0.05 per minute platform fee on top of raw provider costs.^[4] They include:

Turn detection (endpointing). Rather than using a silence timeout to determine when a caller has finished speaking, Vapi employs a custom fusion audio-text model that analyzes both the acoustic properties of the caller's voice and the semantic content of what was said.^[4] This allows the system to distinguish between a natural mid-sentence pause and a genuine end-of-turn, which reduces false triggers and improves perceived conversational fluency.

Interruption handling (barge-in). A separate custom model distinguishes genuine interruptions (the caller saying "stop" or cutting off the agent mid-sentence) from backchannel signals.^[4] Backchannel signals are short affirmations like "yeah," "uh-huh," or "got it" that human listeners produce to signal they are still engaged without intending to take the floor. When Vapi detects a backchannel signal, it passes that information to the LLM as context. When it detects a true interruption, it notes the point in the agent's speech where it was cut off and informs the LLM, so the model can resume coherently or adapt its response.

Audio filtering. Vapi runs two parallel audio filtering models. A noise filter removes ambient sounds including music and traffic while preserving speech content. A background voice filter isolates the primary speaker and suppresses other voices, which matters in environments like call centers, open offices, or households where multiple people may be speaking nearby.^[4]

Emotion detection. A proprietary model extracts emotional inflection from the caller's voice and passes that signal to the LLM as context. This allows the model to adapt its tone or escalation behavior based on whether the caller sounds calm, frustrated, or distressed. Bessemer cited this capability as a key technical differentiator in its investment announcement.^[3]

Filler injection. Because LLMs produce streamed text that begins with formal language, Vapi applies a custom model to inject natural filler sounds and conversational phrases in real-time. This avoids the uncanny valley problem of an agent that pauses completely in silence while processing and then begins speaking with formal language.

Backchanneling. A fusion audio-text model detects appropriate moments during the caller's speech to insert brief affirmations from the agent. The model selects contextually suitable responses based on the content being spoken.

All six of these models run in parallel at sub-50 millisecond latency budgets. The full voice-to-voice round trip, from when the caller stops speaking to when the agent's first audio byte plays, targets between 500 and 700 milliseconds over WebRTC.^[4] Over telephone networks, additional network latency from Twilio or Telnyx adds roughly 400 to 600 milliseconds on top of that.

Transport layer

Vapi supports two transport modes. The WebRTC mode connects browser and mobile apps directly to Vapi's servers using the WebRTC protocol, achieving the lowest possible latency for web applications. The telephony mode routes calls through phone infrastructure using either Twilio or Telnyx as the SIP carrier, or a developer's own SIP trunk.^[4]

Provider integrations

Vapi maintains first-party integrations across all three pipeline layers and the telephony layer. The table below lists the primary supported providers as of early 2025.

Layer	Provider	Notes
Speech-to-text	Deepgram Nova-2, Nova-3	Default option; low latency
Speech-to-text	AssemblyAI Universal-Streaming	Lowest cost option at ~$0.00025/min
Speech-to-text	Gladia	Multilingual specialization
Speech-to-text	OpenAI Whisper	Via OpenAI API
Speech-to-text	Speechmatics	High accuracy for accents
Language model	OpenAI GPT-4o, GPT-4o mini	Most common default
Language model	Anthropic Claude (Sonnet, Haiku)	Strong instruction following
Language model	Google Gemini 1.5 Flash, Pro	Long context capability
Language model	Groq (Llama, Mixtral)	Ultra-low inference latency
Language model	Custom endpoint	Developer's own server via HTTP
Text-to-speech	ElevenLabs	High expressiveness; higher cost
Text-to-speech	Cartesia Sonic	Low latency, natural voices
Text-to-speech	Deepgram Aura	Integrated with Deepgram STT
Text-to-speech	PlayHT	Voice cloning support
Text-to-speech	LMNT	Fast streaming synthesis
Text-to-speech	Azure Neural TTS	Enterprise compliance
Telephony	Twilio	Most widely used; higher cost
Telephony	Telnyx	Lower cost alternative
Telephony	Vonage	Available option
Telephony	Custom SIP trunk	Bring your own carrier

The modularity is intentional. Dearsley has described the design philosophy as: "That's why our approach has always been very modular," noting that the right combination of providers depends on balancing "three variables: performance, latency, and cost" and that the optimal stack changes as individual providers improve.^[8]

Telephony and phone numbers

Developers can provision phone numbers directly through the Vapi dashboard, which purchases numbers via Twilio on the backend. Numbers are currently limited to United States area codes through this native purchase flow; international numbers require importing from an external Twilio or Telnyx account.

Vapi supports both inbound and outbound calling. For inbound calls, a phone number is associated with an assistant configuration. When the number receives a call, Vapi routes it through the configured STT, LLM, and TTS pipeline. For outbound calls, developers trigger calls via the API, specifying the target number and the assistant configuration to use.

Vapi also offers a soft-phone capability through its Web SDK, which uses WebRTC to connect browser-based callers directly to an assistant without requiring a traditional phone number. This is commonly used for website chat widgets that offer voice as an option.

Custom SIP trunking (bring-your-own-carrier) is available on enterprise plans, allowing companies that already have existing telephony infrastructure to route calls into Vapi without changing carriers.

Pricing

Vapi's pricing model is layered. The platform charges a base orchestration fee of $0.05 per minute on top of whatever the underlying providers charge. Provider costs are passed through approximately at cost.^[12]

A typical call stack in 2025 breaks down roughly as follows:

Component	Provider	Approximate cost per minute
Platform fee	Vapi	$0.05
Speech-to-text	Deepgram Nova-2	$0.01
Language model	GPT-4o mini	$0.05-$0.10
Text-to-speech	Cartesia Sonic	$0.01-$0.02
Text-to-speech	ElevenLabs Flash v2.5	$0.03-$0.04
Telephony	Twilio outbound	$0.008-$0.014
Telephony	Telnyx outbound	$0.005-$0.008
Total (typical range)		$0.15-$0.30

The advertised $0.05 per minute figure covers only the orchestration layer. Actual all-in costs typically land between $0.15 and $0.33 per minute depending on which LLM and TTS providers are selected and whether calls are inbound or outbound.^[12] Premium voice providers like ElevenLabs and high-capability LLMs like GPT-4o (full version) push costs toward the upper end of that range.

Vapi offers a pay-as-you-go plan with no monthly commitment, limited to 10 concurrent calls. Enterprise plans include volume discounts, higher concurrency limits, dedicated support, and add-ons including HIPAA compliance (priced at approximately $1,000 per month) and a signed Data Processing Agreement (DPA).^[12]

New accounts receive approximately $10 in credits to test the platform before spending.

Squads (multi-agent calls)

Vapi launched Squads in November 2025 as a way to build voice AI systems that involve multiple specialized assistants within a single call.^[5] The problem Squads addresses is that as a voice agent's responsibilities grow, cramming all functionality into a single prompt and tool set makes the system increasingly fragile and unreliable.^[5] Squads allow developers to define a pool of specialized assistants, each with a focused prompt and tool set, and configure routing logic that hands off between them during the conversation.^[5]

From the caller's perspective, a Squad behaves as a single continuous call. One phone number, one conversation thread, one transcript. Behind the scenes, a billing assistant might handle the opening of a call before handing off to a scheduling assistant, which then transfers to a technical support specialist. Each handoff happens without dead air or noticeable interruption.^[5]

Developers control the context passed between assistants at each handoff point. The options include passing no prior context (useful for sensitive operations like payment collection), passing only the last N messages, or passing the full conversation history.^[5] Routing logic is expressed through standard LLM tool calls and explicit prompts rather than black-box decision trees, which makes it straightforward to debug and audit.^[5]

Vapi provides a canvas-based visual builder for designing Squad flows, where assistants appear as nodes and routing conditions appear as labeled edges between them.^[5]

Fleetworks, a company that builds AI agents for the transportation industry, is one of the most publicly cited users of Squads. The company runs assistants handling dispatch, scheduling, billing, and support as a coordinated Squad, processing more than 240,000 calls per day.^[7] The company has stated that breaking these workflows across specialized assistants rather than building one monolithic agent was what allowed them to scale reliably.^[7]

Latency performance

Latency is one of Vapi's primary competitive claims. The platform's own documentation targets a voice-to-voice round trip of 500 to 700 milliseconds over WebRTC under optimal conditions.^[4] Published benchmarks from third parties show how component selection affects this number in practice.

AssemblyAI published a configuration that achieved approximately 465 milliseconds end-to-end over WebRTC using:^[6]

AssemblyAI Universal-Streaming for STT (approximately 90ms)
Groq-hosted Llama 4 Maverick for LLM (approximately 200ms)
ElevenLabs Flash v2.5 for TTS (approximately 75ms)
Network overhead accounting for the remaining ~100ms

This configuration required deliberately disabling Vapi's default turn detection padding settings. The default "startSpeakingPlan" configuration adds a pause before the agent speaks to reduce false starts, which can contribute more than 1.5 seconds of latency on its own if not tuned.^[6] Disabling this feature requires prompt-level and configuration-level adjustments but is documented in Vapi's developer documentation.

Over telephone (PSTN) networks, the same configuration produces approximately 965 milliseconds due to the additional 400 to 600 milliseconds of latency introduced by Twilio's network.^[6] This is important context when comparing Vapi's stated latency targets with competitors: the 465 ms benchmark is a best-case WebRTC number achieved with a specifically tuned low-latency stack. Production telephony deployments with default configurations typically see round trips of 800 to 1,500 milliseconds.

For comparison, human phone conversations are generally considered natural at roughly 700 milliseconds of round-trip response time. Latency above 1,000 milliseconds creates noticeable pauses that many callers interpret as hesitation or connection problems.

Customers and use cases

Vapi's publicly disclosed customers span home security, healthcare, transportation, customer service, and fintech verticals. Its largest named customer is Amazon's Ring, which routes 100% of its inbound support calls through the platform after selecting Vapi over more than 40 competing vendors in late 2025.^[14]

FleetWorks automates communications between transportation brokers and truck drivers. The company uses Vapi Squads to run assistants covering dispatch coordination, scheduling, billing inquiries, and driver support within single calls. Fleetworks processes over 240,000 calls per day through the platform and has publicly stated the switch to Vapi's managed infrastructure saved more than 100 engineering hours per month that would otherwise go toward maintaining voice pipeline code.^[7]

Luma Health is a healthcare patient engagement platform that uses Vapi for automated outbound calls around appointment reminders, scheduling changes, and care gap outreach. Healthcare deployments require HIPAA compliance, which Vapi supports through its enterprise plan with a signed Business Associate Agreement.^[10]

Ellipsis Health uses voice AI for mental health screening, where the platform captures spoken responses and routes them through clinical analysis models. The use of Vapi's emotion detection models is relevant to this application.

Mindtickle is a sales readiness platform that uses Vapi-powered voice agents for sales training simulations, where sales representatives practice calls with AI counterparts.

Beyond named customers, Vapi is widely used for outbound sales calling, appointment booking, lead qualification, customer service escalation routing, and post-call survey collection. The platform supports more than 100 languages and accents, which makes it viable for international deployments.

How does Vapi compare to Retell, Bland, and LiveKit?

The voice AI infrastructure market includes several competing developer platforms. The main alternatives to Vapi are Retell AI, Bland AI, LiveKit Agents, and the open-source framework Pipecat.

Platform	Model	Pricing	Latency	Compliance	Target users
Vapi	Managed SaaS, modular stack	$0.05/min + providers (~$0.15-$0.33 all-in)	500-700ms (WebRTC); 800-1,500ms (telephony)	SOC 2 Type II; HIPAA at $1,000/mo add-on	Developers and engineering-led teams
Retell AI	Managed SaaS, all-inclusive	$0.07-$0.15/min (all-inclusive)	300-500ms	SOC 2 Type I/II; HIPAA with self-service BAA	Healthcare, insurance, regulated industries
Bland AI	Managed SaaS	$0.07-$0.14/min	~800ms average	Not specified	Outbound sales, SDR automation
LiveKit	Open-source + Cloud (WebRTC-first)	$0.01/min session + providers	Depends on stack	Self-managed; SOC 2 on Cloud	Voice + video agents, real-time apps
Pipecat	Open-source Python framework	Free (self-hosted); Pipecat Cloud $0.01-$0.03/min	Depends on infrastructure	Self-managed	Technical teams, edge deployment

Vapi vs Retell AI. Retell presents the most direct competition. Both platforms target developers building voice agents for business phone calls. Retell's all-inclusive pricing bundles STT, LLM, and TTS into a flat per-minute rate, which simplifies cost forecasting but removes the ability to substitute providers.^[12] Retell's advertised latency is lower at 300 to 500 milliseconds and the platform includes native warm call transfer and branded call display, features Vapi does not natively offer.^[12] Retell's compliance story is also simpler for regulated industries: its BAA is available through a self-service portal, while Vapi's HIPAA support requires an enterprise contract and an additional fee.^[12] Vapi's advantage is modularity. Teams that need a specific LLM provider, a specific voice, or a specific STT service that Retell does not offer can configure it through Vapi's bring-your-own model.

Vapi vs Bland AI. Bland targets outbound calling automation, with a visual interface and built-in features for call logging, confidence ratings, and call summaries that require manual configuration in Vapi. Bland's per-minute pricing is higher on the base plan but includes more built-in functionality. Vapi offers more flexibility for custom architectures and non-standard deployments. Bland's built-in memory allows agents to recall information from previous calls with the same contact, a feature that requires custom implementation in Vapi.

Vapi vs LiveKit. LiveKit is a WebRTC-first real-time platform with an open-source agents framework and a hosted LiveKit Cloud offering. Because it was built for real-time audio and video transport from the start, voice-and-video agents and other latency-sensitive real-time applications are often stronger on LiveKit, and its Cloud agent-session minutes are inexpensive (around $0.01 per minute), though developers still pay separately for STT, LLM, and TTS providers and assume more of the orchestration burden. Vapi, by contrast, ships the higher-level conversational orchestration (turn detection, barge-in, telephony routing) as a managed product, trading some control for faster time to production.

Vapi vs Pipecat. Pipecat is an open-source Python framework maintained by Daily.co. It provides similar STT/LLM/TTS pipeline orchestration but requires developers to manage their own infrastructure, handle scalability themselves, and implement real-time audio handling at a lower level. There is no orchestration fee, but engineering overhead is substantially higher. Pipecat is appropriate for teams with the infrastructure expertise to run their own voice systems or for edge deployment scenarios. Vapi occupies the managed infrastructure position and abstracts away that complexity at the cost of the platform fee and reduced control over the underlying stack.

Reception

Vapi has received positive reception from the developer community for reducing the time required to build a production voice agent from weeks to days. The platform's Squads feature received particular attention when Fleetworks published figures showing 240,000 daily calls running through a multi-agent Squad configuration.^[7] The Amazon Ring win in May 2026, in which Ring moved 100% of its inbound support calls onto Vapi after a 40-vendor evaluation, was widely cited in trade press as validation of the platform's enterprise readiness.^[14]

Bessemer Venture Partners' investment memo described Vapi's developer experience as ten times better than the alternative of building voice infrastructure in-house, citing the team's shipping velocity as unusual even by startup standards.^[3] The firm's memo specifically called out Vapi's proprietary real-time audio model that detects emotional inflections and the ability to deliver enterprise-grade scalability, reliability, and fault tolerance as differentiating technical properties.^[3]

Third-party reviews have noted that the platform's modular approach gives experienced developers a high degree of control that opinionated platforms like Retell do not provide.^[12] The 100+ language support and the ability to bring custom LLM endpoints are cited as advantages for international deployments and organizations with proprietary models.

The market context for Vapi's growth is worth noting. Bessemer's 2024 investment thesis described telephony as a 150-year-old technology still critical for conveying nuanced information, handling time-sensitive transactions, and reaching demographics that are not comfortable with web or mobile interfaces.^[3] Healthcare, legal services, home services, insurance, and logistics were specifically cited as sectors where phone calls remain the primary customer service channel.^[3] Vapi's growth reflects developer adoption of a tool positioned to serve this segment of the economy with AI automation.

Limitations

Vapi has received criticism across several recurring areas.

Pricing opacity. The advertised $0.05 per minute figure understates actual costs. All-in production costs typically run $0.15 to $0.33 per minute when STT, LLM, TTS, and telephony are included.^[12] Managing billing relationships with four to five separate vendors simultaneously adds accounting overhead that is absent from all-inclusive platforms. Multiple developer reviews have described frustration with cost forecasting.

Technical complexity. Vapi's flexibility creates a corresponding configuration burden. Non-technical users find the API-first interface difficult to approach. Even experienced developers report that achieving reliable behavior requires extensive prompt engineering and tuning of the orchestration model parameters. The default configurations are not optimized for every use case, and some latency improvements require disabling features that are on by default.

Platform stability. Developer community reports, including posts on the Vapi Discord and Trustpilot reviews, document instances where platform updates broke working assistant configurations. Support response times have been described as slow, with the primary support channel being Discord. Several reviews from 2024 described a period when documentation lagged behind feature changes.

HIPAA complexity. HIPAA compliance is available only on enterprise plans at an additional fee of approximately $1,000 per month. The compliance configuration requires routing call recordings to a developer-managed S3 bucket with encryption at rest, disabling Vapi's default data persistence (which removes call logs and transcripts from Vapi's own storage), and signing a Business Associate Agreement as part of the enterprise contract process.^[10] For teams in regulated industries, this setup is more cumbersome than platforms that include HIPAA compliance at lower tiers.

Phone number limitations. The native phone number purchase flow supports only US numbers. International numbers require an existing Twilio or Telnyx account with the number already provisioned, then imported into Vapi.

No on-premise hosting. Vapi runs exclusively in the cloud. Organizations that require on-premise deployment for data sovereignty or security reasons cannot use the platform.

Concurrent call limits on free tier. The pay-as-you-go plan caps concurrent calls at 10, which is sufficient for development but constrains production deployments that see traffic spikes. Higher concurrency limits require an enterprise agreement.

References

Dearsley, Jordan and Gupta, Nikhil. "Vapi Dials-in $20M in Series A Led by Bessemer to Bring AI Voice Agents to Enterprise." GlobeNewswire, December 12, 2024. https://www.globenewswire.com/news-release/2024/12/12/2996317/0/en/Vapi-Dials-in-20M-in-Series-A-Led-by-Bessemer-to-Bring-AI-Voice-Agents-to-Enterprise.html ↩
Shieber, Jonathan. "YC-backed productivity app Superpowered pivots to become a voice API platform for bots." TechCrunch, November 10, 2023. https://techcrunch.com/2023/11/10/yc-backed-productivity-app-superpowered-pivots-to-become-a-voice-api-platform-for-bots/ ↩
Bessemer Venture Partners. "Our investment in Vapi: the voice AI developer platform." BVP, December 2024. https://www.bvp.com/news/our-investment-in-vapi-the-voice-ai-developer-platform ↩
Vapi. "How Vapi Works." Vapi Documentation. https://docs.vapi.ai/how-vapi-works ↩
Vapi. "Introducing Squads: Teams of Assistants." Vapi Blog, November 13, 2025. https://vapi.ai/blog/introducing-squads ↩
AssemblyAI. "How to build the lowest latency voice agent in Vapi: Achieving ~465ms end-to-end Latency." AssemblyAI Blog. https://www.assemblyai.com/blog/how-to-build-lowest-latency-voice-agent-vapi ↩
Vapi. "Fleetworks saves 100+ engineering hours per month." Vapi Blog. https://vapi.ai/blog/fleetworks-saves-100-engineering-hours-per-month-2 ↩
Dearsley, Jordan. "Voice AI Platforms in 2025: A Conversation with Vapi Founder." Coval, 2025. https://www.coval.ai/blog/voice-ai-platforms-in-2025-a-conversation-with-vapi-founder-jordan-dearsley ↩
AI Minds Podcast. "AI Minds #056: Jordan Dearsley, Founder & CEO at Vapi." Deepgram. https://deepgram.com/podcast/ai-minds-056-jordan-dearsley-founder-ceo-at-vapi
Vapi. "HIPAA Compliance." Vapi Documentation. https://docs.vapi.ai/security-and-privacy/hipaa ↩
Y Combinator. "Vapi: Voice AI for developers." YC Company Directory. https://www.ycombinator.com/companies/vapi ↩
Retell AI. "Vapi AI Review 2026: Pricing, Features & Top Alternative." Retell AI Blog. https://www.retellai.com/blog/vapi-ai-review ↩
Vapi. "Vapi raises $50M Series B as it reaches 1 billion calls, powering the next generation of enterprise voice AI." Yahoo Finance (press release), May 12, 2026. https://finance.yahoo.com/sectors/technology/articles/vapi-raises-50m-series-b-130000518.html ↩
Wiggers, Kyle. "AI voice startup Vapi hits $500M valuation after winning Amazon Ring over 40 rivals." TechCrunch, May 12, 2026. https://techcrunch.com/2026/05/12/vapi-hits-500m-valuation-as-amazon-ring-chose-its-ai-platform-over-40-rivals/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

5 revisions by 1 contributors · full history

Suggest edit

What links here

AI Voice Agent Bland AI Cartesia Companies Deepgram Nova-3 GPT-Realtime / OpenAI Realtime API LiveKit Agents OpenAI Realtime API Pipecat PolyAI Retell AI

What is Vapi used for?

History

Founders

Founding of Vapi

Funding

When did Vapi raise its Series A?

How much has Vapi raised in its Series B?

Why did Amazon Ring choose Vapi?

Architecture

How does the Vapi pipeline work?

Orchestration layer

Transport layer

Provider integrations

Telephony and phone numbers

Pricing

Squads (multi-agent calls)

Latency performance

Customers and use cases

How does Vapi compare to Retell, Bland, and LiveKit?

Reception

Limitations

See also

References

Improve this article

Related Articles

Retell AI

Bland AI

OpenAI Realtime API

LiveKit Agents

Pipecat

Siri

What links here

Related Articles

Retell AI

Bland AI

OpenAI Realtime API

LiveKit Agents

Pipecat

Siri

What links here