PolyAI is a British enterprise voice AI company founded in 2017 that builds autonomous phone agents for large-scale contact centers. The company deploys voice assistants capable of handling millions of customer calls annually across sectors including hospitality, financial services, retail, and utilities. PolyAI's agents are designed to resolve customer queries without human intervention, a metric the company calls containment rate, which its deployments regularly achieve at 80 percent or above. As of late 2025, the company serves more than 200 enterprise clients in over 25 countries, has raised more than $200 million in total funding, and carries a valuation of $750 million following its Series D round.
The company's roots lie in the Dialog Systems Group at the University of Cambridge's Machine Intelligence Laboratory, a research unit focused on statistical approaches to spoken dialogue. Nikola Mrkšić, a Serbian-born computer scientist who studied at Cambridge on a scholarship, joined a startup called VocalIQ while finishing his PhD. VocalIQ was developing self-learning conversational software and was backed by Amadeus Capital Partners. In October 2015, Apple acquired VocalIQ to strengthen Siri. Mrkšić and his future co-founders Tsung-Hsien Wen and Pei-Hao Su, both Taiwanese researchers who also completed their doctoral work at Cambridge, spent time at Apple after the acquisition. Wen worked there as a researcher before moving to Google, and Su went to Facebook before the three regrouped.
Mrkšić, Wen, and Su founded PolyAI in London in 2017, the same year they submitted their PhD theses. They had met through the Cambridge Dialog Systems Group and graduated from the Entrepreneur First startup accelerator program, though they already knew each other before EF. The founding thesis was straightforward: large companies were spending enormous amounts staffing contact centers, the voice channel still handled the majority of customer service interactions, and the dialogue systems research coming out of Cambridge was good enough to automate a meaningful portion of those calls. The name PolyAI reflects the company's original orientation toward polyglot, multi-domain conversational systems.
One of the first production tests was with Whitbread, the British hospitality group that owns Premier Inn hotels. PolyAI deployed a voice agent to handle restaurant and hotel reservation calls, which gave the team their first exposure to the messy realities of telephony: low-quality audio, heavy accents, interruptions mid-sentence, and callers who said the same thing twenty different ways.
PolyAI raised an initial seed round followed by a $12 million Series A in March 2019, led by Point72 Ventures. Sands Capital Ventures, Amadeus Capital Partners, and Passion Capital also participated. At that stage the company had roughly a dozen enterprise clients and was explicitly positioning its agents as supplementing human agents rather than replacing them outright.
After several years of building out the platform and growing its customer base, PolyAI raised a $40 million Series B in September 2022, led by Georgian. Twilio Ventures, Khosla Ventures, Point72 Ventures, and Amadeus Capital Partners joined the round, bringing total funding to $70 million. The post-money valuation was near $300 million. The company had 98 employees at the time across its US and UK offices. Mrkšić attributed the accelerating demand partly to the labor market pressures following the pandemic, which pushed enterprises to accelerate voice automation plans they had been considering for years.
PolyAI raised $50 million in a Series C in May 2024. Hedosophia, Nvidia's corporate venture arm NVentures, and Zendesk Ventures came in as new investors, joined by follow-on commitments from Khosla Ventures, Georgian, Point72 Ventures, Sands Capital, and Passion Capital. Total funding crossed $120 million with the round. Nvidia's involvement reflected the company's use of GPU infrastructure for real-time inference and its collaboration with NVIDIA Riva for the foundational speech recognition work underlying the Owl ASR model. Mrkšić said at the time: "We aim to be the voice that handles over half of all automated customer service calls within the next five years."
In December 2025, PolyAI closed an $86 million Series D co-led by Georgian, Hedosophia, and Khosla Ventures. Additional participants included NVentures, the British Business Bank, Citi Ventures, Squarepoint Ventures, Sands Capital, Zendesk Ventures, and Point72 Ventures. The round valued the company at $750 million, a figure analysts noted was based on approximately a 25-times revenue multiple, conservative relative to some US competitors whose valuations reflected 100-times multiples. Total funding surpassed $200 million. By this point the company had grown to over 200 enterprise customers, more than 2,000 live deployments, and a revenue run rate of around $35 million for 2025. The company reported that its AI agents were collectively handling the equivalent of 151 years of customer calls per year and that multiple enterprise clients had deployments equivalent to 1,000 or more full-time employees.
PolyAI's platform is built around four functional layers it describes as the Thinker, the Listener, the Speaker, and the Connector. The Thinker handles language understanding and response generation. The Listener processes incoming audio in real time. The Speaker handles voice synthesis, combining human voice recordings with neural text-to-speech to produce an output that matches a brand's specified tone. The Connector manages integrations with telephony infrastructure, CRM systems, and contact center platforms.
The company positions itself as owning its full stack rather than wrapping third-party models, a distinction that matters for latency, behavior control, and data confidentiality in regulated industries. This is reflected in two proprietary models that sit at the core of the platform: Raven, a large language model trained specifically for customer service conversations, and Owl, an automatic speech recognition model trained for telephone audio.
Raven is PolyAI's in-house language model, optimized for the requirements of live voice calls rather than general text tasks. General-purpose models like GPT-4o perform reasonably on text benchmarks but were not designed around the constraints of phone conversations: response tokens need to arrive quickly enough to avoid awkward pauses, outputs need to respect business rules and policy guardrails consistently, and the model needs to detect when a query falls outside its scope rather than hallucinating an answer.
Raven v2 was trained on three times more data across four times more domains than its predecessor, with a quantized architecture that reduces time-to-first-token. The model uses a compact function-calling format that trims roughly 18 tokens per output, and it runs on co-located dedicated infrastructure to avoid the latency overhead of third-party API calls.
Raven 3.5, released in early 2026, extended the model with auto-reasoning capabilities that allow it to decide on a turn-by-turn basis whether a query warrants deeper deliberation or a quick response. The reasoning module activates on roughly half of conversational turns without increasing perceived latency above 300 milliseconds. Raven 3.5 also introduced improved multilingual quality across 23 languages, webchat support alongside voice, and out-of-domain detection using special output tokens. PolyAI's internal benchmarks show Raven 3.5 outperforming GPT-5 and Claude Sonnet 4.6 on four customer service evaluation sets built from real anonymized calls, measuring instruction-following, output quality in English and non-English data, and stylistic consistency.
The post-training method combines Group Relative Policy Optimization (GRPO) with Direct Preference Optimization (DPO). The rationale is that GRPO is effective for reinforcing strong behaviors but has a ceiling when a model consistently fails on certain examples; mixing in DPO examples provides direct demonstrations of correct behavior for those failure cases.
Owl is PolyAI's custom automatic speech recognition model for telephone audio. The company developed it after analyzing its own production deployments and finding that approximately 70 percent of voice agent errors originated from transcription mistakes rather than from downstream reasoning failures. Commercial ASR systems designed for clean microphone audio underperform on phone calls, which arrive with codec compression, background noise, hold music artifacts, and callers who are not speaking in broadcast-quality conditions.
Owl was built on a pretrained model from NVIDIA as a foundation, then fine-tuned on proprietary telephone audio spanning healthcare, financial services, retail, travel, hospitality, and utilities. PolyAI reports a word error rate of 0.122 on its evaluation set, outperforming four unnamed commercial ASR systems tested in the same conditions. The model was designed with geographic diversity in its training data to handle accents and dialects from across the company's client footprint in over 25 countries.
Agent Studio, PolyAI's management platform, includes keyphrase boosting (which raises the prior probability for domain-specific terminology the ASR might otherwise mishear) and transcript correction tooling (which allows operators to flag and fix recurring misrecognitions so the system learns from them).
Agent Studio is PolyAI's web-based control layer for building, configuring, and monitoring voice agents. Launched in April 2025, it provides a no-code and low-code interface for designing conversation flows, configuring integrations, reviewing call analytics, and managing agent behavior across a deployment. It is aimed at non-technical business users who want to modify agent behavior without waiting for engineering support.
In April 2026, PolyAI released the Agent Development Kit (ADK), a developer-facing SDK and command-line tool that allows technical teams to build and modify Agent Studio-compatible agents from a local IDE rather than through the browser interface. Every agent resource in the ADK is a versioned file, so teams can manage agent changes through standard version control workflows, including branching, diffing, reviewing, and merging. The ADK also connects to automated QA pipelines that allow agents to identify their own performance gaps and surface improvement candidates continuously. PolyAI describes this as shifting from static automation to systems that adapt without ongoing manual configuration.
PolyAI integrates with telephony via SIP trunking and PSTN, and connects to major contact center platforms including Genesys, NICE inContact, Avaya, and Salesforce Service Cloud. On the reservation and booking side, it has a direct integration with OpenTable, announced in September 2024, that allows restaurant operators to accept reservations by phone and query live availability. The platform supports seamless handoff to human agents with full conversation context transferred at the point of escalation, so the human agent does not need to ask the caller to repeat themselves.
PolyAI operates on a managed service model for large enterprise deployments. The company's implementation team handles agent design, configuration, integration with backend systems, and ongoing optimization. This is different from developer-first platforms where the customer assembles the stack themselves. Typical enterprise deployment timelines are quoted at around six weeks, with the ADK aimed at compressing this for teams with in-house engineering capability.
The platform supports 45 languages and mid-conversation language switching, which matters for multilingual markets where a caller may begin in one language and switch to another mid-call.
Hospitality is one of PolyAI's strongest verticals. Phone reservation handling is a labor-intensive task in the sector, with branded hotels and restaurant chains receiving hundreds of thousands of inbound calls per year from customers who prefer to speak to someone rather than book online. PolyAI voice agents answer 100 percent of calls without hold queues, handle reservations, answer questions about amenities or policies, and transfer to a human when the query falls outside the agent's scope.
Marriott International, the world's largest hotel chain, is one of PolyAI's publicly named customers. Caesars Entertainment, the US casino and resort operator, has also deployed PolyAI agents in its contact operations. Pavan Kapur, former Chief Commercial Officer at Caesars, said at the time of the Series C: "I was impressed with PolyAI's conversational abilities and the impact it will have on customer experience."
Whitbread, owner of Premier Inn, was among PolyAI's earliest production customers and served as a test bed for the company's early telephony work.
Fogo de Chão, the Brazilian steakhouse chain, deployed a PolyAI agent named Selma across all 88 of its US locations. Selma answers 100 percent of inbound calls, up from an estimated 75 percent call answer rate before the deployment. The agent achieves a 95 percent guest satisfaction score and an 88 percent booking completion rate. Over its first 12 months Selma is projected to handle more than 250,000 reservations. Fogo de Chão reported that around 40 percent of callers who discussed the chain's loyalty program during a call with Selma opted into it.
The Melting Pot, the American fondue restaurant chain, deployed a PolyAI agent named Judy to handle reservation calls. The agent automates 68 percent of all reservation-related calls and answers more than half of all calls to the chain's locations. In the first six months of full rollout, Judy generated $250,000 in revenue from after-hours bookings that would previously have gone unanswered. The figure was later revised to $300,000 in PolyAI's updated case study.
Greene King, the British pub and hotel group, is also among PolyAI's UK customers, using the platform for inbound call handling across its estate.
Financial services deployments tend to focus on account inquiries, payment processing, fraud reporting routing, and identity verification. UniCredit, the Italian banking group, is among PolyAI's named financial services customers. In the UK, NS&I (National Savings and Investments), the government-backed savings bank, has published an algorithmic transparency record documenting its use of PolyAI for customer call handling, one of the more unusual public disclosures of an AI deployment in financial services.
The compliance requirements in financial services are among the most demanding PolyAI navigates. The platform supports PCI-DSS compliant payment handling and integrates with identity verification workflows. The UK Financial Conduct Authority's consumer duty rules require firms to be able to demonstrate that automated systems are not disadvantaging customers, which affects how PolyAI structures escalation logic and records interactions.
Foot Locker and PG&E (the California electric and gas utility) are among publicly named customers in retail and utilities. PG&E's use case involves handling high call volumes during outages and billing inquiries, contexts where call spikes can overwhelm human agent capacity rapidly. Foot Locker's deployment covers customer service calls across its branded retail footprint.
FedEx, the global courier and logistics company, is one of PolyAI's most frequently cited enterprise customers. Logistics call centers receive large volumes of calls from senders and recipients tracking shipments, arranging redeliveries, and resolving customs inquiries. These calls have relatively predictable structures and high volume, making them well-suited to voice automation.
The enterprise voice AI space has several notable players with different positioning:
| Feature | PolyAI | Vapi | Retell AI | Sierra AI |
|---|---|---|---|---|
| Target customer | Large enterprises (Fortune 500) | Developers / SMB to enterprise | SMB to mid-market enterprise | Large enterprises |
| Deployment model | Managed service with ADK | Developer self-serve | Self-serve with enterprise support | Managed service |
| Pricing approach | Custom contracts (~$150k/year+ enterprise) | $0.05/min platform fee plus stack | From $0.07/min | Custom contracts |
| Proprietary models | Yes (Raven LLM, Owl ASR) | Bring-your-own model | Bring-your-own model | Yes (multi-model architecture) |
| Voice focus | Voice-first with webchat | Voice and text | Voice-first | Voice and text |
| Latency | Sub-300ms (Raven 3.5) | Configurable by provider choice | ~600ms typical | Not publicly specified |
| Language support | 45 languages | Depends on chosen TTS/STT | Multilingual | English-primary |
| Compliance certifications | SOC 2, HIPAA, PCI-DSS | Requires per-provider BAAs | SOC 2 Type II, HIPAA BAA | SOC 2 |
| Typical use case | Contact centers with 50+ agents | Custom voice apps, prototyping | Mid-size contact centers | Brand-aligned CX for DTC brands |
| Founded | 2017 | 2023 | 2023 | 2023 |
Vapi is a developer-first platform that lets builders choose their own speech recognition, language model, and text-to-speech providers and assemble them into a pipeline. It is highly flexible and relatively inexpensive at the component level, but total stack costs (provider fees across STT, LLM, and TTS) and the complexity of managing compliance chains with multiple providers position it differently from PolyAI's integrated offering. Vapi's target user is an engineering team that wants maximum control over model selection and workflow design.
Retell AI occupies a middle position. It offers a drag-and-drop builder for simpler flows alongside an API for more complex needs, with response latencies averaging around 600ms that the company describes as among the fastest available for the approach. Retell is typically cheaper than PolyAI at scale and offers self-service onboarding that PolyAI does not, making it more accessible for mid-market organizations that do not want a long implementation process.
Sierra AI was founded in 2023 by Bret Taylor (former Salesforce co-CEO and OpenAI board chair) and Clay Bavor (Google veteran). Sierra focuses on brand governance, allowing enterprises to configure very precise behavioral guardrails so that an AI agent's tone, phrasing, and policy adherence match a company's identity closely. Voice capabilities launched in late 2024. Sierra is often compared with PolyAI for consumer-facing enterprises in hospitality and luxury retail where brand differentiation matters heavily, but PolyAI has a significantly longer production track record.
PolyAI has appeared on the Forbes AI 50 list, which tracks privately held AI companies judged to have the most promising commercial prospects. Mrkšić was named to the Forbes 30 Under 30 Europe class of 2021 in the technology category. The company has been listed in Gartner's Cool Vendors and Market Guide for Conversational AI Solutions. It won a Gold Stevie Award for Contact Center Solutions in 2024 and won the Best Customer Service Solution category at the 2026 Stevie Awards.
A Forrester Total Economic Impact study commissioned by PolyAI found that customers achieved an average return on investment of 391 percent over three years, with an average of $10.3 million in cost savings per deployment. The payback period was under six months in the modeled scenario. The study covers avoided agent labor costs, reduced handle times, and revenue captured from calls that would previously have gone unanswered outside business hours.
Industry analysts have generally rated PolyAI positively for voice quality and containment rates, while noting that the managed service model can feel restrictive for customers who want to modify their agents quickly without going through PolyAI's implementation team. Gartner Peer Insights reviews for the platform reflect broadly positive scores on voice quality and integration depth, with more mixed feedback on self-service configuration tooling, though the ADK released in 2026 was designed to address this directly.
PolyAI's pricing and deployment model limits access for smaller organizations. Enterprise contracts are typically structured around annual commitments starting around $150,000, with additional per-minute usage fees. This entry point is appropriate for organizations with high call volumes but prohibitive for smaller businesses, a point the company acknowledges by positioning PolyAI primarily for organizations with 50 or more agents.
Until the ADK launched in April 2026, customers who wanted to modify their voice agents needed to route requests through PolyAI's implementation team, which could slow iteration cycles. Enterprise clients accustomed to making rapid changes to self-hosted software sometimes found this process frustrating, particularly when testing new conversation flows or adjusting routing logic. The ADK is PolyAI's response to this, but it still requires developer capability that some enterprise IT teams may not have readily available.
Speech recognition accuracy, while strong by industry benchmarks, is not perfect. Owl ASR performs well on accents common in PolyAI's training data but may underperform on very unusual dialects or in extremely noisy environments. Reviewers have noted occasional latency spikes that disrupt conversational flow, though sub-300ms average performance suggests these are edge cases rather than systemic.
The platform is voice-first. While Raven 3.5 added webchat support, customers that need consistent AI-driven experiences across voice, email, and asynchronous chat channels may need to supplement PolyAI with additional tools for non-voice channels.
Call analytics in Agent Studio have received mixed reviews. Aggregate metrics like call volume, containment rate, and average handle time are well-represented. More granular analysis of caller sentiment patterns, topic clustering across call cohorts, or predictive quality scoring is less developed compared to standalone analytics products.