Perplexity has two distinct meanings in the field of artificial intelligence. In natural language processing and information theory, perplexity (often abbreviated PPL) is a fundamental evaluation metric that measures how well a probability model predicts a sample of text. Formally, it quantifies the average uncertainty a language model faces when predicting the next token in a sequence. A model with lower perplexity assigns higher probabilities to the actual observed words, indicating that it has captured the statistical patterns of the language more effectively. Since its introduction in 1977, perplexity has served as the standard intrinsic evaluation metric for language models, from early n-gram systems through modern large language models. In the commercial technology space, Perplexity AI is an American artificial intelligence company that operates an AI-powered search engine, often described as an "answer engine," that uses retrieval-augmented generation to synthesize responses from web sources with inline citations.
Imagine you are playing a guessing game where you have to predict the next word in a sentence. If you are really good at the game, you only have to think about a few possible words before guessing correctly. If you are bad at the game, you might have hundreds of words spinning around in your head and you are confused about which one comes next.
Perplexity is a number that tells us how confused a computer is when it plays this guessing game. A small number (like 10) means the computer only has to think about roughly 10 words before it guesses right. A big number (like 500) means it is much more confused. So when scientists build language models, they want the perplexity to be as low as possible, because that means the computer is good at guessing what word comes next.
The concept of perplexity has its roots in Claude Shannon's 1948 paper "A Mathematical Theory of Communication," which introduced entropy as a measure of information content and uncertainty in communication systems. Shannon estimated that English text contains approximately 1 to 1.5 bits of information per letter, far less than the theoretical maximum of about 4.7 bits if all letter sequences were equally likely. This gap reflects the deep statistical structure of natural language.
Perplexity itself was formally introduced in 1977 by Frederick Jelinek, Robert Mercer, Lalit Bahl, and James Baker at IBM's Thomas J. Watson Research Center. Their paper, "Perplexity: A Measure of the Difficulty of Speech Recognition Tasks," presented at the 94th meeting of the Acoustical Society of America, argued that vocabulary size and static branching factors were inadequate measures of speech recognition complexity. They proposed perplexity (whose logarithm is the familiar entropy) as a more appropriate measure of equivalent choice.
During the n-gram era of language modeling, perplexity served as the primary benchmark. Researchers would train n-gram models (bigram, trigram, and higher-order) on text corpora and compare their perplexity scores on held-out test sets. Kneser-Ney smoothing and other techniques were developed in part to reduce perplexity on standard benchmarks. Throughout the 1980s and 1990s, perplexity became the primary benchmark for evaluating n-gram language models, used to compare different smoothing techniques, backoff strategies, and model orders. When neural network-based language models emerged in the 2000s, starting with Bengio et al.'s feedforward neural language model (2003), perplexity remained the standard yardstick, allowing direct comparison between statistical and neural approaches. The Transformer architecture (2017) and subsequent large language models continued the tradition, reporting perplexity on established benchmarks such as Penn Treebank, WikiText-2, and WikiText-103, though the field has increasingly adopted task-specific evaluation alongside perplexity.
Perplexity is mathematically defined as the exponentiated average negative log-likelihood of a sequence. Given a test sequence W = w_1, w_2, ..., w_N, the perplexity of a language model is:
PPL(W) = exp( -(1/N) * sum from i=1 to N of ln P(w_i | w_1, ..., w_{i-1}) )
Where:
Using base-2 logarithms, this can be written equivalently as:
PPL(W) = 2^( -(1/N) * sum from i=1 to N of log_2 P(w_i | w_1, ..., w_{i-1}) )
This is also equivalent to PPL(W) = 2^H(W), where H(W) is the cross-entropy of the model on the test sequence, measured in bits. In this form, perplexity represents the number of bits needed on average to encode each word using an optimal code based on the model's predicted distribution.
An equivalent product form, often used with n-gram models, is:
PPL(W) = ( product from i=1 to N of 1 / P(w_i | context) )^(1/N)
This is the inverse geometric mean of the token probabilities. The formula makes the intuition clear: if a model assigns high probability to each observed token, the product of inverse probabilities will be small, yielding a low perplexity. For a bigram model specifically, the perplexity of a test sequence is PPL(W) = (product from i=1 to N of 1/P(w_i | w_{i-1}))^(1/N). This product form is mathematically equivalent to the exponential form above but can be more intuitive for understanding how individual token probabilities contribute to the overall score.
Perplexity is directly linked to cross-entropy. Cross-entropy measures the average number of bits required to identify an event from a set of possibilities, given a coding scheme optimized for a predicted probability distribution rather than the true distribution. The cross-entropy H(p, q) between the true distribution p and the model distribution q is:
H(p, q) = -(1/N) * sum from i=1 to N of log_2 P(w_i | context)
Perplexity is simply the exponentiation of the cross-entropy:
PPL = 2^H(p, q)
Or equivalently, using natural logarithms:
PPL = exp(H_nat(p, q))
This relationship is significant for neural network training. When a neural language model is trained by minimizing the cross-entropy loss (the standard practice), it is implicitly minimizing perplexity on the training data. The average negative log-likelihood computed during training is precisely the cross-entropy, so perplexity is simply the exponential of the training loss. A model that achieves lower cross-entropy loss during training will, by definition, also achieve lower perplexity on the same data.
Perplexity is also related to entropy, which represents the theoretical lower bound on cross-entropy for a given language. While entropy H(p) measures the inherent uncertainty in the true distribution, cross-entropy H(p, q) is always greater than or equal to entropy. The difference between cross-entropy and entropy is the Kullback-Leibler divergence, which measures how much the model distribution q differs from the true distribution p. A perfect model (where q = p) would achieve cross-entropy equal to the true entropy, and its perplexity would represent the inherent unpredictability of the language itself.
Perplexity can be interpreted as the effective number of equally likely choices the model considers at each prediction step, or as the weighted branching factor of a language model. A model with a perplexity of 50 is, on average, as uncertain as if it had to choose uniformly among 50 equally likely tokens for each word. If a model has a perplexity of 10 on a given dataset, it means that, on average, the model is as uncertain as if it had to choose uniformly among 10 equally likely options for each word. A perplexity of 1 would mean the model always predicts the correct next word with absolute certainty (a theoretical ideal that is unreachable in practice for natural language). This interpretation as a "weighted branching factor" gives the metric a concrete, intuitive meaning.
Key properties of perplexity include:
| Property | Description |
|---|---|
| Lower bound | Perplexity is always at least 1.0. A perplexity of 1 means the model predicts every token with 100% confidence and is always correct. |
| Upper bound | The theoretical maximum equals the vocabulary size V, corresponding to a uniform distribution over all tokens. |
| Lower is better | A lower perplexity indicates the model is less "surprised" by the test data and has learned better language patterns. |
| Same test set required | Perplexity scores are only meaningful when comparing models evaluated on the same test dataset. |
| Tokenizer dependence | Scores are not comparable across models with different tokenizers, because different segmentation changes both the number of predictions and the difficulty of each prediction. |
| Dataset dependence | Perplexity scores are only meaningful when compared across models evaluated on the same test dataset using the same tokenization scheme. |
Because different models use different tokenization schemes (word-level, subword, character-level, byte-level), raw perplexity scores are often not directly comparable. Several normalized metrics address this issue.
Bits per character (BPC) measures the average number of bits needed to encode one character of text under the model's predicted distribution. It is computed as the total cross-entropy loss divided by the number of characters. Good character-level language models typically achieve 1.0 to 1.5 BPC on English text. In 2019, Transformer-XL achieved a then state-of-the-art 0.99 BPC on the enwik8 benchmark.
Bits per byte (BPB) is similar but normalizes by raw bytes rather than characters. This metric is particularly useful for byte-level models and allows comparison across different character encodings.
Bits per word (BPW) normalizes by word count and is the direct logarithmic transform of word-level perplexity: BPW = log_2(PPL). For example, a trigram model achieving a perplexity of 247 on the Brown Corpus corresponds to approximately 7.95 bits per word, or about 1.75 bits per letter.
Because these metrics normalize by a fixed unit (character, byte, or word) rather than by the model's internal tokens, they enable fairer comparisons across models with different vocabularies and tokenization strategies. The total information content of a test set is the same regardless of how it is segmented, so cross-entropy can always be converted between different granularity levels.
The choice of tokenization granularity significantly affects reported perplexity values, and direct comparison requires careful normalization.
Word-level perplexity counts each whitespace-delimited word as one prediction step. Early n-gram models and some recurrent models (such as AWD-LSTM) reported word-level perplexity on benchmarks like Penn Treebank and WikiText-103.
Subword-level perplexity counts each subword token (produced by BPE, WordPiece, or SentencePiece) as one prediction step. Models like GPT-2 (using BPE with ~50,000 tokens), BERT (using WordPiece with ~30,000 tokens), and LLaMA use subword tokenization. Because subword models split rare words into multiple tokens, they make more predictions per word, and each individual prediction tends to be easier. This results in lower per-token perplexity even if the model captures equivalent information.
Research has shown that tokenization differences can affect traditional perplexity measurements by up to 21.6%. To normalize across tokenization schemes, researchers can convert to bits per character or bits per byte, or use token-normalized perplexity that accounts for the total sequence length in characters rather than tokens.
Vocabulary size directly affects perplexity. A model with a larger vocabulary faces a harder prediction task at each step because it must distribute probability mass across more possible tokens. All else being equal, a model with a vocabulary of 50,000 tokens will tend to report higher perplexity than an otherwise identical model with a vocabulary of 10,000 tokens.
This is why perplexity comparisons should always be made between models using the same vocabulary and tokenizer. When comparing across different vocabulary sizes, bits per character or bits per byte provides a more equitable comparison, since these metrics are independent of how the text is segmented into tokens.
The following tables show perplexity scores for notable language models on standard benchmarks. These scores illustrate the progress in language modeling over the years. Lower scores indicate better performance.
| Model | Test perplexity | Year |
|---|---|---|
| Kneser-Ney 5-gram | 141.2 | 1995 |
| AWD-LSTM + dynamic eval | 51.1 | 2017 |
| Mogrifier LSTM + dynamic eval | 44.8 | 2019 |
| Mogrifier RLSTM + dynamic eval | 42.9 | 2022 |
| GPT-2 (1.5B parameters) | 35.76 | 2019 |
| GPT-3 (175B parameters) | 20.50 | 2020 |
| Model | Test perplexity | Year |
|---|---|---|
| LSTM (Graves, 2013) | 48.7 | 2016 |
| Transformer with tied adaptive embeddings | 20.5 | 2018 |
| Transformer-XL Large | 18.3 | 2018 |
| Compressive Transformer | 17.1 | 2019 |
| Transformer-XL + RMS dynamic eval | 16.4 | 2019 |
| Routing Transformer | 15.8 | 2020 |
| Model | Dataset | BPC | Year |
|---|---|---|---|
| PPM compression | enwik8 | 1.46 | 1996 |
| Transformer-XL | enwik8 | 0.99 | 2019 |
| Model | Dataset | Perplexity | Year |
|---|---|---|---|
| GPT-2 (1.5B) | WikiText-2 | 19.93 | 2019 |
| GPT-2 (1.5B) | LAMBADA | 8.63 | 2019 |
| GPT-3 (175B) | LAMBADA (zero-shot) | 3.00 | 2020 |
| GPT-3 (175B) | LAMBADA (few-shot) | 1.92 | 2020 |
These results demonstrate a clear trend: larger models, better architectures, and more training data consistently reduce perplexity. The transition from n-gram models to LSTMs to Transformers brought dramatic improvements, with perplexity on WikiText-103 dropping from roughly 48.7 to below 16 over just a few years. However, perplexity scores should not be compared across different datasets, since the difficulty of prediction varies with the domain, vocabulary, and text style.
A critical distinction in language model evaluation is between training perplexity and test set perplexity.
Training perplexity measures how well the model fits its training data. It decreases during training as the model learns to assign higher probability to the observed sequences. However, very low training perplexity does not necessarily indicate a good model; it may simply reflect memorization.
Test set perplexity (or held-out perplexity) measures how well the model generalizes to unseen data. This is the metric that matters for evaluation. A model that has memorized its training data will show a large gap between training and test perplexity, a classic sign of overfitting.
Monitoring both metrics during training is standard practice. If training perplexity continues to decrease while test perplexity plateaus or increases, the model is overfitting. Techniques such as dropout, weight decay, and early stopping are commonly used to address this gap.
Perplexity is used in a wide range of applications beyond direct language model comparison.
Language model comparison and selection. Perplexity provides a quick, task-agnostic way to compare language models. Before running expensive downstream evaluations, researchers often use perplexity as a first filter to identify the most promising model variants.
Speech recognition. Perplexity was originally developed to estimate the difficulty of speech recognition tasks. A language model with lower perplexity used in a speech recognizer's decoder typically leads to lower word error rates, because the model better constrains the search space of possible transcriptions. It helps estimate how difficult a recognition task will be for a given language model component in automatic speech recognition (ASR) systems.
Machine translation. Language model perplexity is used to evaluate the fluency component of machine translation systems. Although BLEU scores are preferred for end-to-end translation evaluation, the target-side language model's perplexity helps assess output fluency.
Topic modeling. In probabilistic topic models like Latent Dirichlet Allocation (LDA), held-out perplexity is used to determine the optimal number of topics. Lower perplexity on held-out documents suggests the model has found a better decomposition of the corpus.
Text quality and anomaly detection. High perplexity on a given text sample may indicate that the text is out of domain, poorly written, or anomalous relative to the training distribution. This property has applications in data cleaning and quality filtering for training corpora. Documents with unusually high perplexity relative to a trained model may contain errors, be from a different domain, or include unusual content.
AI-generated text detection. Perplexity plays a central role in AI content detection systems such as GPTZero, ZeroGPT, the OpenAI AI Text Classifier, and similar classifiers. Text produced by a language model tends to have low perplexity when evaluated by a similar model, because AI-generated text follows highly predictable statistical patterns. Human-written text typically exhibits higher perplexity due to greater variability in word choice and sentence structure. However, detection based on perplexity alone has significant weaknesses: paraphrasing and editing can raise perplexity above detection thresholds, domain-specific text (such as legal or technical writing) may have naturally low perplexity, and newer models fine-tuned for output variability can evade detection.
Despite its widespread use, perplexity has several important limitations that researchers and practitioners should be aware of.
| Limitation | Explanation |
|---|---|
| No measure of coherence | A model can achieve low perplexity by predicting common word sequences well, while still producing incoherent or nonsensical text over longer spans. |
| No measure of factuality | Perplexity evaluates statistical fit, not whether the model's outputs are factually correct. A model might assign high probability to plausible-sounding but false statements. |
| No measure of usefulness | Low perplexity does not imply that a model is useful for downstream tasks like question answering, summarization, or instruction following. |
| Weak correlation with some downstream tasks | Tasks requiring semantic understanding or reasoning (question answering, sentiment analysis) show weak correlation with perplexity, because next-token prediction does not directly capture deeper comprehension. |
| Domain sensitivity | A model trained on news text may show excellent perplexity on news test sets but poor perplexity on medical or legal text. Cross-domain perplexity comparisons can be misleading. |
| Tokenizer dependence | Models with different tokenizers produce incomparable perplexity scores without normalization. |
| Sensitivity to small test sets | Small differences in perplexity (2 to 3 points) on small test sets may not be statistically significant and can reflect random variation rather than genuine model differences. |
| Repetitive text can score well | A model that produces repetitive, low-entropy text may achieve low perplexity despite generating poor-quality output. |
| Long-context averaging | Standard perplexity averages loss across all tokens equally, which can obscure a model's performance on tokens that genuinely require long-range context. Metrics like LongPPL have been proposed to address this by isolating tokens whose prediction leverages distant context. |
For these reasons, modern evaluation of large language models often supplements perplexity with task-specific benchmarks such as MMLU, HellaSwag, TruthfulQA, and HumanEval.
The relationship between perplexity and downstream task performance is nuanced and has been the subject of considerable research.
For tasks that directly involve predicting or generating text (speech recognition, machine translation, text generation), perplexity tends to correlate well with task performance. This makes sense, because these tasks depend on the same core capability that perplexity measures: accurate next-token prediction.
For tasks requiring higher-level understanding (question answering, natural language inference, commonsense reasoning), the correlation is weaker. A model can be excellent at predicting common word sequences yet struggle with reasoning tasks. Research by the XLNet authors noted that improved language model perplexity does not always lead to improvement on downstream tasks. Conversely, the RoBERTa work found that better perplexity on the masked language modeling objective did lead to better end-task accuracy on sentiment analysis and inference benchmarks.
This mixed evidence motivated the development of comprehensive benchmark suites such as GLUE, SuperGLUE, BIG-Bench, MMLU, and HumanEval, which evaluate language models on a diverse set of downstream tasks. Modern large language model evaluation typically supplements perplexity with these task-specific benchmarks, human evaluations, and safety assessments to provide a more complete picture of model capabilities.
See also: AI content detectors, burstiness
Perplexity plays a central role in AI content detection systems such as GPTZero, ZeroGPT, and the OpenAI AI Text Classifier. These tools use perplexity as a signal to distinguish human-written text from AI-generated text.
The core idea is that text produced by a language model tends to have low perplexity when evaluated by a similar model, because AI-generated text follows highly predictable patterns. Human-written text, by contrast, tends to have higher perplexity due to greater variability in word choice, sentence structure, and stylistic decisions. When an AI detector feeds a document through a language model and the resulting perplexity is unusually low, the system flags the text as likely AI-generated.
GPTZero, one of the first widely used AI detectors, pioneered the approach of combining perplexity with burstiness (a measure of how much the writing style varies throughout a document). AI-generated text typically exhibits both low perplexity and low burstiness, because language models produce text at a consistent level of predictability. Human writing, on the other hand, tends to show bursts of complexity followed by simpler passages, resulting in higher burstiness.
The following table summarizes how perplexity and burstiness interact in AI content detection:
| Metric | Human-Written Text | AI-Generated Text |
|---|---|---|
| Perplexity | Generally higher (more surprising, varied word choices) | Generally lower (predictable, follows statistical patterns) |
| Burstiness | Higher (variable sentence length and complexity) | Lower (uniform writing style throughout) |
| Combined Signal | High perplexity + high burstiness = likely human | Low perplexity + low burstiness = likely AI |
However, perplexity-based detection has significant weaknesses:
As a result, modern AI detection tools like GPTZero have evolved beyond relying solely on perplexity and burstiness, incorporating multilayered systems with seven or more components including neural classifiers trained on large datasets of human and AI-generated text.
Computing perplexity for modern large language models involves some practical considerations.
Fixed-length context. Most Transformer-based models have a fixed context window (for example, 2,048 or 4,096 tokens for GPT-2). When evaluating on text longer than the context window, researchers use a sliding window approach, scoring overlapping segments and averaging the results. The Hugging Face Transformers library provides standard implementations of this approach.
Softmax probabilities. Perplexity is computed from the softmax output probabilities of the model. For each position in the test sequence, the model's predicted probability distribution over the vocabulary is compared to the actual next token, and the log probability of the correct token is accumulated.
Computational cost. Evaluating perplexity requires a full forward pass through the model for the entire test set, which can be expensive for very large models. Unlike many downstream benchmarks, perplexity evaluation does not require any fine-tuning or task-specific adaptation, making it relatively straightforward to compute.
Perplexity AI, Inc. is an American artificial intelligence company headquartered in San Francisco, California, that develops and operates an AI-powered search engine. Often referred to as an "answer engine," Perplexity processes natural language queries and returns synthesized, cited responses rather than a traditional list of blue links. The company was founded in August 2022 and launched its consumer search product on December 7, 2022.
As of early 2026, Perplexity has over 45 million monthly active users, processes an estimated 35 to 45 million queries per day, and attracts roughly 170 million global website visitors per month. The company is valued at approximately $21 billion and has around 1,494 employees.
Perplexity AI was co-founded by four engineers with backgrounds in back-end systems, artificial intelligence, and machine learning:
| Founder | Role | Background |
|---|---|---|
| Aravind Srinivas | CEO | Ph.D. in Computer Science from UC Berkeley. Former AI researcher at OpenAI, DeepMind, and Google Brain. B.S. and M.S. in Electrical Engineering from IIT Madras. |
| Denis Yarats | CTO | Ph.D. in Computer Science from NYU, focusing on reinforcement learning and NLP. Former AI Research Scientist at Meta AI (FAIR). Previously worked on Bing at Microsoft and as a Staff ML Engineer at Quora. |
| Johnny Ho | Chief Strategy Officer | Former engineer at Quora. Former quantitative trader at Tower Research Capital. Competitive programming world champion. |
| Andy Konwinski | President | Ph.D. in Computer Science. Co-founder of Databricks. Expert in distributed systems and big data processing through work on Apache Spark. |
Aravind Srinivas has been the public face of the company, articulating the vision of replacing traditional search engines with a system that directly answers questions with cited sources, rather than requiring users to sift through links.
Perplexity's search engine is built on retrieval-augmented generation (RAG), a technique that combines information retrieval with generative AI to produce grounded, cited responses. The system follows a multi-stage pipeline:
Query Intent Parsing. When a user submits a query, a language model analyzes the intent behind the question, going beyond simple keyword matching to achieve semantic understanding of context, nuance, and the user's underlying goal.
Web Retrieval. The system searches the web and its internal index to retrieve relevant documents and web pages. Perplexity uses Vespa.ai as its core search infrastructure, which integrates vector search for semantic understanding, lexical search for precision, structured filtering, and machine-learned ranking into a single engine.
Chunk-Level Extraction. Rather than passing entire documents to the language model, the system extracts the most relevant text spans (chunks) from retrieved documents. This chunk-level retrieval improves factual accuracy, reduces context length, and minimizes compute cost.
Answer Generation. The curated context is passed to a generative large language model, which synthesizes a natural-language response based strictly on the retrieved information. A core architectural principle is that the model should not say anything that was not retrieved from sources.
Citation Attachment. Inline citations are attached to the generated text, linking specific claims back to their source documents. This allows users to verify every piece of information and explore original sources.
Perplexity continuously ingests data and updates both text and vector indexes in real time without interrupting queries. Its distributed architecture balances data and computation across nodes, co-locating content, indexes, and ranking logic to eliminate bottlenecks.
Perplexity offers a range of products and features that have expanded significantly since its 2022 launch.
The core product is the AI search engine, available at perplexity.ai and through mobile apps on iOS and Android. Users type a natural language question, and the system returns a synthesized answer with inline citations. Follow-up questions are supported in a conversational thread, allowing users to refine their queries. The free tier provides access to basic search with a limited number of Pro Search queries per day.
Perplexity Pro is the paid subscription tier, priced at $20 per month (or $200 per year). Pro subscribers receive:
Announced in July 2025, Perplexity Max is a $200-per-month subscription plan aimed at power users. It includes unlimited access to Labs (a spreadsheet and report generation tool), early access to the Comet browser, and access to the most advanced model configurations.
Perplexity Enterprise Pro, priced at $40 per user per month, targets business teams with features including centralized billing, admin controls, data privacy protections, and team collaboration tools through Spaces.
Comet is Perplexity's AI-native web browser, initially launched in 2025 for desktop (Mac and Windows) and Android. In March 2026, Comet expanded to iOS, bringing its AI assistant to iPhone users. The browser integrates Perplexity's search and AI capabilities directly into the browsing experience, with features including:
Max subscribers can select the model powering their browser agent, choosing from options such as Claude Opus 4.6 and Claude Sonnet 4.5.
Perplexity also launched Comet Enterprise in March 2026, a secure version for teams with granular admin controls, MDM deployment support, audit logs, and integration with CrowdStrike Falcon for threat detection.
Launched in November 2024 for Pro users in the United States, Buy with Pro is an AI-powered shopping feature that lets users research products, compare options, and check out directly within Perplexity. Pro users receive free shipping on Buy with Pro orders. In May 2025, Perplexity partnered with PayPal to add PayPal and Venmo as checkout options within the app.
Spaces is an organizational feature that allows users to create dedicated workspaces for different topics or projects, grouping related searches and conversations together. Perplexity Pages allows users to generate shareable, article-style content from their research queries, transforming AI search results into formatted, publishable pages.
Perplexity develops its own family of AI models under the Sonar brand, which power both the consumer search product and the developer API.
The Sonar models were introduced in February 2025, built on top of Meta's Llama 3.3 70B and further trained to enhance answer factuality and readability.
| Model | Context Window | Speed | Key Strength |
|---|---|---|---|
| Sonar | 127K tokens | ~1,200 tokens/sec | Fast, real-time answers for general queries |
| Sonar Pro | 200K tokens | ~144 tokens/sec | Higher accuracy; leads SimpleQA factuality benchmark (F-score 0.858) |
| Sonar Reasoning Pro | 200K tokens | Varies | Deep analytical and reasoning tasks |
| Sonar Deep Research | 200K tokens | Varies | Comprehensive multi-source research |
Sonar achieves its high throughput of approximately 1,200 tokens per second by running on Cerebras inference infrastructure. Sonar Pro, while slower, delivers meaningfully higher factual accuracy, scoring 0.858 on the SimpleQA benchmark compared to Sonar's 0.773.
The Sonar API allows developers to integrate Perplexity's web-grounded search and answer capabilities into their own applications. The API supports real-time web search, conversational answers with citations, and structured retrieval.
| Model | Input Price (per 1M tokens) | Output Price (per 1M tokens) | Notes |
|---|---|---|---|
| Sonar | $1.00 | $1.00 | Citation tokens no longer billed (as of 2026) |
| Sonar Pro | $3.00 | $15.00 | Citation tokens no longer billed (as of 2026) |
| Sonar Deep Research | $2.00 | $8.00 | Citation tokens: $2.00/1M; Search: $5.00/1K queries |
Additional request fees apply to Sonar, Sonar Pro, and Sonar Reasoning Pro based on search context size. Pro subscribers receive a $5 monthly API credit. The Search API, for applications needing raw web results without synthesized answers, is priced at $5 per 1,000 requests.
Perplexity has raised over $1.2 billion in total funding across multiple rounds, with its valuation growing rapidly from 2023 through 2026.
| Date | Round | Amount | Valuation | Notable Investors |
|---|---|---|---|---|
| 2023 | Series A | $26M | ~$150M | NEA, Databricks |
| January 2024 | Series B | $73.6M | $520M | IVP, NEA, Jeff Bezos, NVIDIA |
| April 2024 | Series B-1 | $63M | $1.04B | Daniel Gross, Bessemer Venture Partners |
| June 2024 | Series C | Undisclosed | $3B | SoftBank Vision Fund 2 |
| December 2024 | Series D | $500M | $9B | Accel, IVP, Jeff Bezos, NVIDIA |
| May 2025 | Series D extension | $500M | $14B | Institutional investors |
| July 2025 | Top-up | $100M | ~$16B | Various investors |
| September 2025 | Series D-2 | $200M | $20B | Institutional investors |
| December 2025 | Undisclosed | Undisclosed | ~$21B | Cristiano Ronaldo (among others) |
Notable individual investors include Amazon founder Jeff Bezos, Shopify CEO Tobias Lutke, Figma CEO Dylan Field, and professional footballer Cristiano Ronaldo.
Perplexity generates revenue primarily through subscriptions. The company reported an estimated annual recurring revenue (ARR) of $148 million in 2025, with projections reaching $656 million for 2026.
The company's revenue sources include:
In early 2024, Perplexity experimented with AI-integrated advertising, but in February 2026, the company transitioned to a subscription-first model by discontinuing ads. In January 2026, Perplexity also signed a three-year, $750 million commitment with Microsoft Azure to secure GPU capacity for its Deep Research and Model Council features.
Perplexity positions itself as a direct alternative to traditional search engines, particularly Google Search. While Perplexity does not yet rival Google's overall market share (Google held approximately 79% of desktop searches as of early 2025, down from 87.6% in 2023), it has carved out a growing niche in the AI-powered search segment.
AI-powered search tools collectively captured an estimated 12 to 15% of global search market share by the end of 2025, up from roughly 5 to 6% at the start of that year. Within the AI chatbot and search market specifically, Perplexity holds approximately 6 to 8% market share as of early 2026, behind ChatGPT and Microsoft Copilot. The company has set a target of 15 to 20% AI chatbot market share within 18 months.
Perplexity differentiates itself from Google and other competitors through several strategic choices:
Other competitors in the AI search space include Google's Gemini-powered AI Overviews (integrated into Google Search), Microsoft Copilot (built on OpenAI models and integrated into Bing), and You.com. Each takes a different approach to combining traditional web indexing with generative AI capabilities.
Perplexity has faced significant criticism and legal action from publishers who allege that the company's web crawling and content summarization practices constitute copyright infringement.
| Date | Plaintiff | Key Allegations |
|---|---|---|
| 2024 | News Corp (Wall Street Journal, New York Post, Barron's) | Unauthorized scraping and reproduction of copyrighted articles |
| 2024 | Forbes | Content was reproduced with minimal attribution |
| August 2025 | Nikkei, Asahi Shimbun | Japanese publishers filed copyright claims |
| August 2025 | Encyclopaedia Britannica, Merriam-Webster | Reference content was scraped and summarized without permission |
| December 2025 | Chicago Tribune | Copyright infringement claims |
| December 2025 | The New York Times | Large-scale copying and distribution of NYT content to power commercial AI products |
The New York Times lawsuit, filed on December 5, 2025, is among the most prominent. The NYT alleged that Perplexity scrapes articles from nytimes.com and obtains them from third-party databases to build a private index feeding its RAG system. The complaint cited research claiming that Perplexity used undeclared user agents, disguised its crawlers to appear as ordinary web browsers, relied on hidden IP addresses, and used third-party crawling tools to avoid detection.
Publishers have argued that AI search tools like Perplexity send far less referral traffic to their websites than traditional search engines, cutting into a revenue source they depend on. A 2024 study found that AI search tools redirect significantly less traffic to original publishers compared to Google Search.
Perplexity has taken several steps to address publisher concerns:
The legal outcomes of these lawsuits will likely have significant implications for how AI search engines can use copyrighted web content and for the broader relationship between generative AI companies and content publishers.
Perplexity has expanded rapidly through 2025 and into 2026, adding new products and capabilities at a fast pace.
2025 milestones:
2026 milestones: