Perplexity

Perplexity has two distinct meanings in the field of artificial intelligence. In natural language processing and information theory, perplexity (often abbreviated PPL) is a fundamental evaluation metric that measures how well a probability model predicts a sample of text. Formally, it quantifies the average uncertainty a language model faces when predicting the next token in a sequence. A model with lower perplexity assigns higher probabilities to the actual observed words, indicating that it has captured the statistical patterns of the language more effectively. Since its introduction in 1977, perplexity has served as the standard intrinsic evaluation metric for language models, from early n-gram systems through modern large language models. In the commercial technology space, Perplexity AI is an American artificial intelligence company that operates an AI-powered search engine, often described as an "answer engine," that uses retrieval-augmented generation to synthesize responses from web sources with inline citations.

Perplexity as a Metric

Explain like I'm 5 (ELI5)

Imagine you are playing a guessing game where you have to predict the next word in a sentence. If you are really good at the game, you only have to think about a few possible words before guessing correctly. If you are bad at the game, you might have hundreds of words spinning around in your head and you are confused about which one comes next.

Perplexity is a number that tells us how confused a computer is when it plays this guessing game. A small number (like 10) means the computer only has to think about roughly 10 words before it guesses right. A big number (like 500) means it is much more confused. So when scientists build language models, they want the perplexity to be as low as possible, because that means the computer is good at guessing what word comes next.

Historical background

The concept of perplexity has its roots in Claude Shannon's 1948 paper "A Mathematical Theory of Communication," which introduced entropy as a measure of information content and uncertainty in communication systems. Shannon estimated that English text contains approximately 1 to 1.5 bits of information per letter, far less than the theoretical maximum of about 4.7 bits if all letter sequences were equally likely. This gap reflects the deep statistical structure of natural language.

Perplexity itself was formally introduced in 1977 by Frederick Jelinek, Robert Mercer, Lalit Bahl, and James Baker at IBM's Thomas J. Watson Research Center. Their paper, "Perplexity: A Measure of the Difficulty of Speech Recognition Tasks," presented at the 94th meeting of the Acoustical Society of America, argued that vocabulary size and static branching factors were inadequate measures of speech recognition complexity. They proposed perplexity (whose logarithm is the familiar entropy) as a more appropriate measure of equivalent choice.

During the n-gram era of language modeling, perplexity served as the primary benchmark. Researchers would train n-gram models (bigram, trigram, and higher-order) on text corpora and compare their perplexity scores on held-out test sets. Kneser-Ney smoothing and other techniques were developed in part to reduce perplexity on standard benchmarks. Throughout the 1980s and 1990s, perplexity became the primary benchmark for evaluating n-gram language models, used to compare different smoothing techniques, backoff strategies, and model orders. When neural network-based language models emerged in the 2000s, starting with Bengio et al.'s feedforward neural language model (2003), perplexity remained the standard yardstick, allowing direct comparison between statistical and neural approaches. The Transformer architecture (2017) and subsequent large language models continued the tradition, reporting perplexity on established benchmarks such as Penn Treebank, WikiText-2, and WikiText-103, though the field has increasingly adopted task-specific evaluation alongside perplexity.

Mathematical formulation

Perplexity is mathematically defined as the exponentiated average negative log-likelihood of a sequence. Given a test sequence W = w_1, w_2, ..., w_N, the perplexity of a language model is:

PPL(W) = exp( -(1/N) * sum from i=1 to N of ln P(w_i | w_1, ..., w_{i-1}) )

Where:

N is the total number of tokens in the test sequence
P(w_i | w_1, ..., w_{i-1}) is the conditional probability the model assigns to token w_i given all preceding tokens
The inner sum computes the average negative log-likelihood across all tokens

Using base-2 logarithms, this can be written equivalently as:

PPL(W) = 2^( -(1/N) * sum from i=1 to N of log_2 P(w_i | w_1, ..., w_{i-1}) )

This is also equivalent to PPL(W) = 2^H(W), where H(W) is the cross-entropy of the model on the test sequence, measured in bits. In this form, perplexity represents the number of bits needed on average to encode each word using an optimal code based on the model's predicted distribution.

An equivalent product form, often used with n-gram models, is:

PPL(W) = ( product from i=1 to N of 1 / P(w_i | context) )^(1/N)

This is the inverse geometric mean of the token probabilities. The formula makes the intuition clear: if a model assigns high probability to each observed token, the product of inverse probabilities will be small, yielding a low perplexity. For a bigram model specifically, the perplexity of a test sequence is PPL(W) = (product from i=1 to N of 1/P(w_i | w_{i-1}))^(1/N). This product form is mathematically equivalent to the exponential form above but can be more intuitive for understanding how individual token probabilities contribute to the overall score.

Relationship to cross-entropy and entropy

Perplexity is directly linked to cross-entropy. Cross-entropy measures the average number of bits required to identify an event from a set of possibilities, given a coding scheme optimized for a predicted probability distribution rather than the true distribution. The cross-entropy H(p, q) between the true distribution p and the model distribution q is:

H(p, q) = -(1/N) * sum from i=1 to N of log_2 P(w_i | context)

Perplexity is simply the exponentiation of the cross-entropy:

PPL = 2^H(p, q)

Or equivalently, using natural logarithms:

PPL = exp(H_nat(p, q))

This relationship is significant for neural network training. When a neural language model is trained by minimizing the cross-entropy loss (the standard practice), it is implicitly minimizing perplexity on the training data. The average negative log-likelihood computed during training is precisely the cross-entropy, so perplexity is simply the exponential of the training loss. A model that achieves lower cross-entropy loss during training will, by definition, also achieve lower perplexity on the same data.

Perplexity is also related to entropy, which represents the theoretical lower bound on cross-entropy for a given language. While entropy H(p) measures the inherent uncertainty in the true distribution, cross-entropy H(p, q) is always greater than or equal to entropy. The difference between cross-entropy and entropy is the Kullback-Leibler divergence, which measures how much the model distribution q differs from the true distribution p. A perfect model (where q = p) would achieve cross-entropy equal to the true entropy, and its perplexity would represent the inherent unpredictability of the language itself.

Interpretation and intuition

Perplexity can be interpreted as the effective number of equally likely choices the model considers at each prediction step, or as the weighted branching factor of a language model. A model with a perplexity of 50 is, on average, as uncertain as if it had to choose uniformly among 50 equally likely tokens for each word. If a model has a perplexity of 10 on a given dataset, it means that, on average, the model is as uncertain as if it had to choose uniformly among 10 equally likely options for each word. A perplexity of 1 would mean the model always predicts the correct next word with absolute certainty (a theoretical ideal that is unreachable in practice for natural language). This interpretation as a "weighted branching factor" gives the metric a concrete, intuitive meaning.

Key properties of perplexity include:

Property	Description
Lower bound	Perplexity is always at least 1.0. A perplexity of 1 means the model predicts every token with 100% confidence and is always correct.
Upper bound	The theoretical maximum equals the vocabulary size V, corresponding to a uniform distribution over all tokens.
Lower is better	A lower perplexity indicates the model is less "surprised" by the test data and has learned better language patterns.
Same test set required	Perplexity scores are only meaningful when comparing models evaluated on the same test dataset.
Tokenizer dependence	Scores are not comparable across models with different tokenizers, because different segmentation changes both the number of predictions and the difficulty of each prediction.
Dataset dependence	Perplexity scores are only meaningful when compared across models evaluated on the same test dataset using the same tokenization scheme.

Bits per character, bits per byte, and bits per word

Because different models use different tokenization schemes (word-level, subword, character-level, byte-level), raw perplexity scores are often not directly comparable. Several normalized metrics address this issue.

Bits per character (BPC) measures the average number of bits needed to encode one character of text under the model's predicted distribution. It is computed as the total cross-entropy loss divided by the number of characters. Good character-level language models typically achieve 1.0 to 1.5 BPC on English text. In 2019, Transformer-XL achieved a then state-of-the-art 0.99 BPC on the enwik8 benchmark.

Bits per byte (BPB) is similar but normalizes by raw bytes rather than characters. This metric is particularly useful for byte-level models and allows comparison across different character encodings.

Bits per word (BPW) normalizes by word count and is the direct logarithmic transform of word-level perplexity: BPW = log_2(PPL). For example, a trigram model achieving a perplexity of 247 on the Brown Corpus corresponds to approximately 7.95 bits per word, or about 1.75 bits per letter.

Because these metrics normalize by a fixed unit (character, byte, or word) rather than by the model's internal tokens, they enable fairer comparisons across models with different vocabularies and tokenization strategies. The total information content of a test set is the same regardless of how it is segmented, so cross-entropy can always be converted between different granularity levels.

Word-level vs. subword-level perplexity

The choice of tokenization granularity significantly affects reported perplexity values, and direct comparison requires careful normalization.

Word-level perplexity counts each whitespace-delimited word as one prediction step. Early n-gram models and some recurrent models (such as AWD-LSTM) reported word-level perplexity on benchmarks like Penn Treebank and WikiText-103.

Subword-level perplexity counts each subword token (produced by BPE, WordPiece, or SentencePiece) as one prediction step. Models like GPT-2 (using BPE with ~50,000 tokens), BERT (using WordPiece with ~30,000 tokens), and LLaMA use subword tokenization. Because subword models split rare words into multiple tokens, they make more predictions per word, and each individual prediction tends to be easier. This results in lower per-token perplexity even if the model captures equivalent information.

Research has shown that tokenization differences can affect traditional perplexity measurements by up to 21.6%. To normalize across tokenization schemes, researchers can convert to bits per character or bits per byte, or use token-normalized perplexity that accounts for the total sequence length in characters rather than tokens.

Perplexity and vocabulary size

Vocabulary size directly affects perplexity. A model with a larger vocabulary faces a harder prediction task at each step because it must distribute probability mass across more possible tokens. All else being equal, a model with a vocabulary of 50,000 tokens will tend to report higher perplexity than an otherwise identical model with a vocabulary of 10,000 tokens.

This is why perplexity comparisons should always be made between models using the same vocabulary and tokenizer. When comparing across different vocabulary sizes, bits per character or bits per byte provides a more equitable comparison, since these metrics are independent of how the text is segmented into tokens.

Benchmark perplexity scores

The following tables show perplexity scores for notable language models on standard benchmarks. These scores illustrate the progress in language modeling over the years. Lower scores indicate better performance.

Penn Treebank (word-level perplexity)

Model	Test perplexity	Year
Kneser-Ney 5-gram	141.2	1995
AWD-LSTM + dynamic eval	51.1	2017
Mogrifier LSTM + dynamic eval	44.8	2019
Mogrifier RLSTM + dynamic eval	42.9	2022
GPT-2 (1.5B parameters)	35.76	2019
GPT-3 (175B parameters)	20.50	2020

WikiText-103 (word-level perplexity)

Model	Test perplexity	Year
LSTM (Graves, 2013)	48.7	2016
Transformer with tied adaptive embeddings	20.5	2018
Transformer-XL Large	18.3	2018
Compressive Transformer	17.1	2019
Transformer-XL + RMS dynamic eval	16.4	2019
Routing Transformer	15.8	2020

Character-level benchmarks (bits per character)

Model	Dataset	BPC	Year
PPM compression	enwik8	1.46	1996
Transformer-XL	enwik8	0.99	2019

Additional benchmark scores

Model	Dataset	Perplexity	Year
GPT-2 (1.5B)	WikiText-2	19.93	2019
GPT-2 (1.5B)	LAMBADA	8.63	2019
GPT-3 (175B)	LAMBADA (zero-shot)	3.00	2020
GPT-3 (175B)	LAMBADA (few-shot)	1.92	2020

These results demonstrate a clear trend: larger models, better architectures, and more training data consistently reduce perplexity. The transition from n-gram models to LSTMs to Transformers brought dramatic improvements, with perplexity on WikiText-103 dropping from roughly 48.7 to below 16 over just a few years. However, perplexity scores should not be compared across different datasets, since the difficulty of prediction varies with the domain, vocabulary, and text style.

Test set perplexity vs. training perplexity

A critical distinction in language model evaluation is between training perplexity and test set perplexity.

Training perplexity measures how well the model fits its training data. It decreases during training as the model learns to assign higher probability to the observed sequences. However, very low training perplexity does not necessarily indicate a good model; it may simply reflect memorization.

Test set perplexity (or held-out perplexity) measures how well the model generalizes to unseen data. This is the metric that matters for evaluation. A model that has memorized its training data will show a large gap between training and test perplexity, a classic sign of overfitting.

Monitoring both metrics during training is standard practice. If training perplexity continues to decrease while test perplexity plateaus or increases, the model is overfitting. Techniques such as dropout, weight decay, and early stopping are commonly used to address this gap.

Applications

Perplexity is used in a wide range of applications beyond direct language model comparison.

Language model comparison and selection. Perplexity provides a quick, task-agnostic way to compare language models. Before running expensive downstream evaluations, researchers often use perplexity as a first filter to identify the most promising model variants.

Speech recognition. Perplexity was originally developed to estimate the difficulty of speech recognition tasks. A language model with lower perplexity used in a speech recognizer's decoder typically leads to lower word error rates, because the model better constrains the search space of possible transcriptions. It helps estimate how difficult a recognition task will be for a given language model component in automatic speech recognition (ASR) systems.

Machine translation. Language model perplexity is used to evaluate the fluency component of machine translation systems. Although BLEU scores are preferred for end-to-end translation evaluation, the target-side language model's perplexity helps assess output fluency.

Topic modeling. In probabilistic topic models like Latent Dirichlet Allocation (LDA), held-out perplexity is used to determine the optimal number of topics. Lower perplexity on held-out documents suggests the model has found a better decomposition of the corpus.

Text quality and anomaly detection. High perplexity on a given text sample may indicate that the text is out of domain, poorly written, or anomalous relative to the training distribution. This property has applications in data cleaning and quality filtering for training corpora. Documents with unusually high perplexity relative to a trained model may contain errors, be from a different domain, or include unusual content.

AI-generated text detection. Perplexity plays a central role in AI content detection systems such as GPTZero, ZeroGPT, the OpenAI AI Text Classifier, and similar classifiers. Text produced by a language model tends to have low perplexity when evaluated by a similar model, because AI-generated text follows highly predictable statistical patterns. Human-written text typically exhibits higher perplexity due to greater variability in word choice and sentence structure. However, detection based on perplexity alone has significant weaknesses: paraphrasing and editing can raise perplexity above detection thresholds, domain-specific text (such as legal or technical writing) may have naturally low perplexity, and newer models fine-tuned for output variability can evade detection.

Limitations

Despite its widespread use, perplexity has several important limitations that researchers and practitioners should be aware of.

Limitation	Explanation
No measure of coherence	A model can achieve low perplexity by predicting common word sequences well, while still producing incoherent or nonsensical text over longer spans.
No measure of factuality	Perplexity evaluates statistical fit, not whether the model's outputs are factually correct. A model might assign high probability to plausible-sounding but false statements.
No measure of usefulness	Low perplexity does not imply that a model is useful for downstream tasks like question answering, summarization, or instruction following.
Weak correlation with some downstream tasks	Tasks requiring semantic understanding or reasoning (question answering, sentiment analysis) show weak correlation with perplexity, because next-token prediction does not directly capture deeper comprehension.
Domain sensitivity	A model trained on news text may show excellent perplexity on news test sets but poor perplexity on medical or legal text. Cross-domain perplexity comparisons can be misleading.
Tokenizer dependence	Models with different tokenizers produce incomparable perplexity scores without normalization.
Sensitivity to small test sets	Small differences in perplexity (2 to 3 points) on small test sets may not be statistically significant and can reflect random variation rather than genuine model differences.
Repetitive text can score well	A model that produces repetitive, low-entropy text may achieve low perplexity despite generating poor-quality output.
Long-context averaging	Standard perplexity averages loss across all tokens equally, which can obscure a model's performance on tokens that genuinely require long-range context. Metrics like LongPPL have been proposed to address this by isolating tokens whose prediction leverages distant context.

For these reasons, modern evaluation of large language models often supplements perplexity with task-specific benchmarks such as MMLU, HellaSwag, TruthfulQA, and HumanEval.

Perplexity vs. downstream task performance

The relationship between perplexity and downstream task performance is nuanced and has been the subject of considerable research.

For tasks that directly involve predicting or generating text (speech recognition, machine translation, text generation), perplexity tends to correlate well with task performance. This makes sense, because these tasks depend on the same core capability that perplexity measures: accurate next-token prediction.

For tasks requiring higher-level understanding (question answering, natural language inference, commonsense reasoning), the correlation is weaker. A model can be excellent at predicting common word sequences yet struggle with reasoning tasks. Research by the XLNet authors noted that improved language model perplexity does not always lead to improvement on downstream tasks. Conversely, the RoBERTa work found that better perplexity on the masked language modeling objective did lead to better end-task accuracy on sentiment analysis and inference benchmarks.

This mixed evidence motivated the development of comprehensive benchmark suites such as GLUE, SuperGLUE, BIG-Bench, MMLU, and HumanEval, which evaluate language models on a diverse set of downstream tasks. Modern large language model evaluation typically supplements perplexity with these task-specific benchmarks, human evaluations, and safety assessments to provide a more complete picture of model capabilities.

Use in AI Content Detection

See also: AI content detectors, burstiness

Perplexity plays a central role in AI content detection systems such as GPTZero, ZeroGPT, and the OpenAI AI Text Classifier. These tools use perplexity as a signal to distinguish human-written text from AI-generated text.

The core idea is that text produced by a language model tends to have low perplexity when evaluated by a similar model, because AI-generated text follows highly predictable patterns. Human-written text, by contrast, tends to have higher perplexity due to greater variability in word choice, sentence structure, and stylistic decisions. When an AI detector feeds a document through a language model and the resulting perplexity is unusually low, the system flags the text as likely AI-generated.

GPTZero, one of the first widely used AI detectors, pioneered the approach of combining perplexity with burstiness (a measure of how much the writing style varies throughout a document). AI-generated text typically exhibits both low perplexity and low burstiness, because language models produce text at a consistent level of predictability. Human writing, on the other hand, tends to show bursts of complexity followed by simpler passages, resulting in higher burstiness.

The following table summarizes how perplexity and burstiness interact in AI content detection:

Metric	Human-Written Text	AI-Generated Text
Perplexity	Generally higher (more surprising, varied word choices)	Generally lower (predictable, follows statistical patterns)
Burstiness	Higher (variable sentence length and complexity)	Lower (uniform writing style throughout)
Combined Signal	High perplexity + high burstiness = likely human	Low perplexity + low burstiness = likely AI

However, perplexity-based detection has significant weaknesses:

Paraphrasing and editing can raise the perplexity of AI-generated text above the detection threshold.
Domain-specific text (such as legal documents or technical manuals) may have naturally low perplexity due to formulaic language, leading to false positives.
Multilingual text and non-standard English dialects can produce unreliable perplexity scores.
Newer models that are fine-tuned or prompted to write with more variability can evade perplexity-based detectors.

As a result, modern AI detection tools like GPTZero have evolved beyond relying solely on perplexity and burstiness, incorporating multilayered systems with seven or more components including neural classifiers trained on large datasets of human and AI-generated text.

Perplexity in practice: computing with modern LLMs

Computing perplexity for modern large language models involves some practical considerations.

Fixed-length context. Most Transformer-based models have a fixed context window (for example, 2,048 or 4,096 tokens for GPT-2). When evaluating on text longer than the context window, researchers use a sliding window approach, scoring overlapping segments and averaging the results. The Hugging Face Transformers library provides standard implementations of this approach.

Softmax probabilities. Perplexity is computed from the softmax output probabilities of the model. For each position in the test sequence, the model's predicted probability distribution over the vocabulary is compared to the actual next token, and the log probability of the correct token is accumulated.

Computational cost. Evaluating perplexity requires a full forward pass through the model for the entire test set, which can be expensive for very large models. Unlike many downstream benchmarks, perplexity evaluation does not require any fine-tuning or task-specific adaptation, making it relatively straightforward to compute.

Perplexity AI (Company)

Overview

Perplexity AI, Inc. is an American artificial intelligence company headquartered in San Francisco, California, that develops and operates an AI-powered search engine. Often referred to as an "answer engine," Perplexity processes natural language queries and returns synthesized, cited responses rather than a traditional list of blue links. The company was founded in August 2022 and launched its consumer search product on December 7, 2022.

As of early 2026, Perplexity has over 45 million monthly active users, processes an estimated 35 to 45 million queries per day, and attracts roughly 170 million global website visitors per month. The company is valued at approximately $21 billion and has around 1,494 employees.

Founders

Perplexity AI was co-founded by four engineers with backgrounds in back-end systems, artificial intelligence, and machine learning:

Founder	Role	Background
Aravind Srinivas	CEO	Ph.D. in Computer Science from UC Berkeley. Former AI researcher at OpenAI, DeepMind, and Google Brain. B.S. and M.S. in Electrical Engineering from IIT Madras.
Denis Yarats	CTO	Ph.D. in Computer Science from NYU, focusing on reinforcement learning and NLP. Former AI Research Scientist at Meta AI (FAIR). Previously worked on Bing at Microsoft and as a Staff ML Engineer at Quora.
Johnny Ho	Chief Strategy Officer	Former engineer at Quora. Former quantitative trader at Tower Research Capital. Competitive programming world champion.
Andy Konwinski	President	Ph.D. in Computer Science. Co-founder of Databricks. Expert in distributed systems and big data processing through work on Apache Spark.

Aravind Srinivas has been the public face of the company, articulating the vision of replacing traditional search engines with a system that directly answers questions with cited sources, rather than requiring users to sift through links.

How Perplexity Works

Perplexity's search engine is built on retrieval-augmented generation (RAG), a technique that combines information retrieval with generative AI to produce grounded, cited responses. The system follows a multi-stage pipeline:

Query Intent Parsing. When a user submits a query, a language model analyzes the intent behind the question, going beyond simple keyword matching to achieve semantic understanding of context, nuance, and the user's underlying goal.
Web Retrieval. The system searches the web and its internal index to retrieve relevant documents and web pages. Perplexity uses Vespa.ai as its core search infrastructure, which integrates vector search for semantic understanding, lexical search for precision, structured filtering, and machine-learned ranking into a single engine.
Chunk-Level Extraction. Rather than passing entire documents to the language model, the system extracts the most relevant text spans (chunks) from retrieved documents. This chunk-level retrieval improves factual accuracy, reduces context length, and minimizes compute cost.
Answer Generation. The curated context is passed to a generative large language model, which synthesizes a natural-language response based strictly on the retrieved information. A core architectural principle is that the model should not say anything that was not retrieved from sources.
Citation Attachment. Inline citations are attached to the generated text, linking specific claims back to their source documents. This allows users to verify every piece of information and explore original sources.

Perplexity continuously ingests data and updates both text and vector indexes in real time without interrupting queries. Its distributed architecture balances data and computation across nodes, co-locating content, indexes, and ranking logic to eliminate bottlenecks.

Products and Features

Perplexity offers a range of products and features that have expanded significantly since its 2022 launch.

Perplexity Search

The core product is the AI search engine, available at perplexity.ai and through mobile apps on iOS and Android. Users type a natural language question, and the system returns a synthesized answer with inline citations. Follow-up questions are supported in a conversational thread, allowing users to refine their queries. The free tier provides access to basic search with a limited number of Pro Search queries per day.

Perplexity Pro

Perplexity Pro is the paid subscription tier, priced at $20 per month (or $200 per year). Pro subscribers receive:

Unlimited Pro Search queries (which use more compute and return deeper, multi-step research results)
Access to multiple advanced AI models, including options optimized for coding, creative writing, deep learning reasoning, and technical summarization
Unlimited file uploads (up to 50 MB per file) for document analysis
Priority compute allocation and faster response times
Early access to new features
A $5 monthly API credit

Perplexity Max

Announced in July 2025, Perplexity Max is a $200-per-month subscription plan aimed at power users. It includes unlimited access to Labs (a spreadsheet and report generation tool), early access to the Comet browser, and access to the most advanced model configurations.

Enterprise Pro

Perplexity Enterprise Pro, priced at $40 per user per month, targets business teams with features including centralized billing, admin controls, data privacy protections, and team collaboration tools through Spaces.

Comet Browser

Comet is Perplexity's AI-native web browser, initially launched in 2025 for desktop (Mac and Windows) and Android. In March 2026, Comet expanded to iOS, bringing its AI assistant to iPhone users. The browser integrates Perplexity's search and AI capabilities directly into the browsing experience, with features including:

A built-in AI assistant that can summarize pages, answer questions about content, and complete web-based tasks
Voice mode for spoken queries
Hybrid search combining traditional web browsing with AI-powered answers
Deep Research integration for multi-source summaries
Support for task automation such as comparing prices across websites or summarizing emails

Max subscribers can select the model powering their browser agent, choosing from options such as Claude Opus 4.6 and Claude Sonnet 4.5.

Perplexity also launched Comet Enterprise in March 2026, a secure version for teams with granular admin controls, MDM deployment support, audit logs, and integration with CrowdStrike Falcon for threat detection.

Buy with Pro (Shopping)

Launched in November 2024 for Pro users in the United States, Buy with Pro is an AI-powered shopping feature that lets users research products, compare options, and check out directly within Perplexity. Pro users receive free shipping on Buy with Pro orders. In May 2025, Perplexity partnered with PayPal to add PayPal and Venmo as checkout options within the app.

Spaces and Pages

Spaces is an organizational feature that allows users to create dedicated workspaces for different topics or projects, grouping related searches and conversations together. Perplexity Pages allows users to generate shareable, article-style content from their research queries, transforming AI search results into formatted, publishable pages.

Sonar Models and API

Perplexity develops its own family of AI models under the Sonar brand, which power both the consumer search product and the developer API.

Sonar Model Family

The Sonar models were introduced in February 2025, built on top of Meta's Llama 3.3 70B and further trained to enhance answer factuality and readability.

Model	Context Window	Speed	Key Strength
Sonar	127K tokens	~1,200 tokens/sec	Fast, real-time answers for general queries
Sonar Pro	200K tokens	~144 tokens/sec	Higher accuracy; leads SimpleQA factuality benchmark (F-score 0.858)
Sonar Reasoning Pro	200K tokens	Varies	Deep analytical and reasoning tasks
Sonar Deep Research	200K tokens	Varies	Comprehensive multi-source research

Sonar achieves its high throughput of approximately 1,200 tokens per second by running on Cerebras inference infrastructure. Sonar Pro, while slower, delivers meaningfully higher factual accuracy, scoring 0.858 on the SimpleQA benchmark compared to Sonar's 0.773.

API Access and Pricing

The Sonar API allows developers to integrate Perplexity's web-grounded search and answer capabilities into their own applications. The API supports real-time web search, conversational answers with citations, and structured retrieval.

Model	Input Price (per 1M tokens)	Output Price (per 1M tokens)	Notes
Sonar	$1.00	$1.00	Citation tokens no longer billed (as of 2026)
Sonar Pro	$3.00	$15.00	Citation tokens no longer billed (as of 2026)
Sonar Deep Research	$2.00	$8.00	Citation tokens: $2.00/1M; Search: $5.00/1K queries

Additional request fees apply to Sonar, Sonar Pro, and Sonar Reasoning Pro based on search context size. Pro subscribers receive a $5 monthly API credit. The Search API, for applications needing raw web results without synthesized answers, is priced at $5 per 1,000 requests.

Funding and Valuation

Perplexity has raised over $1.2 billion in total funding across multiple rounds, with its valuation growing rapidly from 2023 through 2026.

Date	Round	Amount	Valuation	Notable Investors
2023	Series A	$26M	~$150M	NEA, Databricks
January 2024	Series B	$73.6M	$520M	IVP, NEA, Jeff Bezos, NVIDIA
April 2024	Series B-1	$63M	$1.04B	Daniel Gross, Bessemer Venture Partners
June 2024	Series C	Undisclosed	$3B	SoftBank Vision Fund 2
December 2024	Series D	$500M	$9B	Accel, IVP, Jeff Bezos, NVIDIA
May 2025	Series D extension	$500M	$14B	Institutional investors
July 2025	Top-up	$100M	~$16B	Various investors
September 2025	Series D-2	$200M	$20B	Institutional investors
December 2025	Undisclosed	Undisclosed	~$21B	Cristiano Ronaldo (among others)

Notable individual investors include Amazon founder Jeff Bezos, Shopify CEO Tobias Lutke, Figma CEO Dylan Field, and professional footballer Cristiano Ronaldo.

Revenue Model

Perplexity generates revenue primarily through subscriptions. The company reported an estimated annual recurring revenue (ARR) of $148 million in 2025, with projections reaching $656 million for 2026.

The company's revenue sources include:

Subscriptions. Perplexity Pro ($20/month), Perplexity Max ($200/month), and Enterprise Pro ($40/user/month) form the core of the business.
API access. Developers pay per-token and per-request fees for Sonar API usage.
Commerce. Buy with Pro generates revenue through merchant partnerships.
Content partnerships. Licensing deals with content providers such as Getty Images (signed October 2025) for displaying licensed content within search results.
Distribution deals. Partnerships with device manufacturers, including Samsung Galaxy S26 integration announced in early 2026.

In early 2024, Perplexity experimented with AI-integrated advertising, but in February 2026, the company transitioned to a subscription-first model by discontinuing ads. In January 2026, Perplexity also signed a three-year, $750 million commitment with Microsoft Azure to secure GPU capacity for its Deep Research and Model Council features.

Competition with Traditional Search

Perplexity positions itself as a direct alternative to traditional search engines, particularly Google Search. While Perplexity does not yet rival Google's overall market share (Google held approximately 79% of desktop searches as of early 2025, down from 87.6% in 2023), it has carved out a growing niche in the AI-powered search segment.

AI-powered search tools collectively captured an estimated 12 to 15% of global search market share by the end of 2025, up from roughly 5 to 6% at the start of that year. Within the AI chatbot and search market specifically, Perplexity holds approximately 6 to 8% market share as of early 2026, behind ChatGPT and Microsoft Copilot. The company has set a target of 15 to 20% AI chatbot market share within 18 months.

Perplexity differentiates itself from Google and other competitors through several strategic choices:

Direct answers with citations rather than ranked link lists
Transparency about sources, with inline references users can verify
No advertising (as of February 2026), positioning itself as a user-aligned tool
Conversational follow-up allowing users to refine queries iteratively
Multi-model flexibility letting Pro users select the best model for their task

Other competitors in the AI search space include Google's Gemini-powered AI Overviews (integrated into Google Search), Microsoft Copilot (built on OpenAI models and integrated into Bing), and You.com. Each takes a different approach to combining traditional web indexing with generative AI capabilities.

Publisher Controversies and Legal Challenges

Perplexity has faced significant criticism and legal action from publishers who allege that the company's web crawling and content summarization practices constitute copyright infringement.

Key Lawsuits

Date	Plaintiff	Key Allegations
2024	News Corp (Wall Street Journal, New York Post, Barron's)	Unauthorized scraping and reproduction of copyrighted articles
2024	Forbes	Content was reproduced with minimal attribution
August 2025	Nikkei, Asahi Shimbun	Japanese publishers filed copyright claims
August 2025	Encyclopaedia Britannica, Merriam-Webster	Reference content was scraped and summarized without permission
December 2025	Chicago Tribune	Copyright infringement claims
December 2025	The New York Times	Large-scale copying and distribution of NYT content to power commercial AI products

The New York Times lawsuit, filed on December 5, 2025, is among the most prominent. The NYT alleged that Perplexity scrapes articles from nytimes.com and obtains them from third-party databases to build a private index feeding its RAG system. The complaint cited research claiming that Perplexity used undeclared user agents, disguised its crawlers to appear as ordinary web browsers, relied on hidden IP addresses, and used third-party crawling tools to avoid detection.

Publishers have argued that AI search tools like Perplexity send far less referral traffic to their websites than traditional search engines, cutting into a revenue source they depend on. A 2024 study found that AI search tools redirect significantly less traffic to original publishers compared to Google Search.

Perplexity's Response

Perplexity has taken several steps to address publisher concerns:

Publishers' Program. Perplexity launched a revenue-sharing program with participating outlets including Gannett, TIME, Fortune, and the Los Angeles Times.
Comet Plus. In August 2025, Perplexity introduced a $5 monthly add-on that allocates 80% of the fee to participating publishers.
Getty Images partnership. In October 2025, Perplexity signed a multi-year licensing agreement with Getty Images for displaying licensed imagery within search results.

The legal outcomes of these lawsuits will likely have significant implications for how AI search engines can use copyrighted web content and for the broader relationship between generative AI companies and content publishers.

Recent Developments (2025 and 2026)

Perplexity has expanded rapidly through 2025 and into 2026, adding new products and capabilities at a fast pace.

2025 milestones:

February 2025: Launch of Sonar models and developer API
March 2025: Release of Sonar Pro with improved factuality
May 2025: PayPal and Venmo checkout integration
July 2025: Launch of Perplexity Max subscription ($200/month)
October 2025: Getty Images licensing deal
December 2025: Investment from Cristiano Ronaldo; valuation reaches approximately $21 billion

2026 milestones:

January 2026: $750 million, three-year commitment with Microsoft Azure for GPU infrastructure
February 2026: Introduction of Model Council feature, allowing users to compare outputs from multiple large language models simultaneously; transition to subscription-first model with discontinuation of advertising
March 2026: Launch of Comet browser for iOS; launch of Comet Enterprise for teams; announcement of Personal Computer (a locally running AI agent for Mac); Samsung Galaxy S26 integration; upgraded voice mode and improved finance timeline features; launch of state-of-the-art embedding models

References

Shannon, C. E. (1948). "A Mathematical Theory of Communication." *Bell System Technical Journal*, 27(3), 379-423.
Jelinek, F., Mercer, R. L., Bahl, L. R., & Baker, J. K. (1977). "Perplexity: A Measure of the Difficulty of Speech Recognition Tasks." *Journal of the Acoustical Society of America*, 62(S1), S63.
Bengio, Y., Ducharme, R., Vincent, P., & Janvin, C. (2003). "A Neural Probabilistic Language Model." *Journal of Machine Learning Research*, 3, 1137-1155.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). "Language Models are Unsupervised Multitask Learners." *OpenAI Technical Report*.
Brown, T. B., Mann, B., Ryder, N., et al. (2020). "Language Models are Few-Shot Learners." *Advances in Neural Information Processing Systems*, 33, 1877-1901.
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2019). "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context." *Proceedings of ACL 2019*.
Merity, S., Keskar, N. S., & Socher, R. (2018). "Regularizing and Optimizing LSTM Language Models." *Proceedings of ICLR 2018*.
Melis, G., Kocisk, T., & Blunsom, P. (2020). "Mogrifier LSTM." *Proceedings of ICLR 2020*.
Liu, Y., Ott, M., Goyal, N., et al. (2019). "RoBERTa: A Robustly Optimized BERT Pretraining Approach." *arXiv preprint arXiv:1907.11692*.
Mielke, S. J. (2019). "Can You Compare Perplexity Across Different Segmentations?" Blog post, Johns Hopkins University.
Tian, E. (2023). "GPTZero: Towards a Detection System for AI-Generated Text." Princeton University.
Hugging Face. "Perplexity of fixed-length models." Transformers Documentation. https://huggingface.co/docs/transformers/perplexity
Perplexity AI. "Meet New Sonar." Perplexity Blog, February 2025. https://www.perplexity.ai/hub/blog/meet-new-sonar
Perplexity AI. "Introducing the Sonar Pro API." Perplexity Blog, March 2025. https://www.perplexity.ai/hub/blog/introducing-the-sonar-pro-api
Vespa.ai. "How Perplexity uses Vespa.ai to power fast, accurate, and trusted answers for millions of users." https://vespa.ai/perplexity/
TechCrunch. "The New York Times is suing Perplexity for copyright infringement." December 5, 2025. https://techcrunch.com/2025/12/05/the-new-york-times-is-suing-perplexity-for-copyright-infringement/
TechCrunch. "Perplexity launches a $200 monthly subscription plan." July 2, 2025. https://techcrunch.com/2025/07/02/perplexity-launches-a-200-monthly-subscription-plan/
TechCrunch. "Perplexity reportedly raised $200M at $20B valuation." September 10, 2025. https://techcrunch.com/2025/09/10/perplexity-reportedly-raised-200m-at-20b-valuation/
DemandSage. "Perplexity AI Statistics 2026." https://www.demandsage.com/perplexity-ai-statistics/
9to5Mac. "Perplexity AI 'Comet' browser for iPhone now available." March 18, 2026. https://9to5mac.com/2026/03/18/perplexity-brings-ai-comet-browser-to-iphone/
Perplexity AI. "Sonar API Pricing." Perplexity Documentation. https://docs.perplexity.ai/docs/getting-started/pricing
GPTZero. "What is perplexity and burstiness for AI detection?" https://gptzero.me/news/perplexity-and-burstiness-what-is-it/
Merity, S. et al. (2016). "Pointer Sentinel Mixture Models." arXiv:1609.07843. (WikiText benchmark)
Fortune. "Will Perplexity kill Google? Probably not, but Aravind Srinivas's startup is shaking up the AI race." https://fortune.com/article/perplexity-ceo-aravind-srinivas-ai/

Perplexity as a Metric

Explain like I'm 5 (ELI5)

Historical background

Mathematical formulation

Relationship to cross-entropy and entropy

Interpretation and intuition

Bits per character, bits per byte, and bits per word

Word-level vs. subword-level perplexity

Perplexity and vocabulary size

Benchmark perplexity scores

Penn Treebank (word-level perplexity)

WikiText-103 (word-level perplexity)

Character-level benchmarks (bits per character)

Additional benchmark scores

Test set perplexity vs. training perplexity

Applications

Limitations

Perplexity vs. downstream task performance

Use in AI Content Detection

Perplexity in practice: computing with modern LLMs

Perplexity AI (Company)

Overview

Founders

How Perplexity Works

Products and Features

Perplexity Search

Perplexity Pro

Perplexity Max

Enterprise Pro

Comet Browser

Buy with Pro (Shopping)

Spaces and Pages

Sonar Models and API

Sonar Model Family

API Access and Pricing

Funding and Valuation

Revenue Model

Competition with Traditional Search

Publisher Controversies and Legal Challenges

Key Lawsuits

Perplexity's Response

Recent Developments (2025 and 2026)

See also

References

Improve this article

Related Articles

ARC-AGI 2

Agentic Context Engineering

Claude Sonnet 4.5

Computer-use agent

Computer-use model

Context window

Perplexity as a Metric

Explain like I'm 5 (ELI5)

Historical background

Mathematical formulation

Relationship to cross-entropy and entropy

Interpretation and intuition

Bits per character, bits per byte, and bits per word

Word-level vs. subword-level perplexity

Perplexity and vocabulary size

Benchmark perplexity scores

Penn Treebank (word-level perplexity)

WikiText-103 (word-level perplexity)

Character-level benchmarks (bits per character)

Additional benchmark scores

Test set perplexity vs. training perplexity

Applications

Limitations

Perplexity vs. downstream task performance

Use in AI Content Detection

Perplexity in practice: computing with modern LLMs

Perplexity AI (Company)

Overview

Founders

How Perplexity Works

Products and Features

Perplexity Search

Perplexity Pro

Perplexity Max