GPTZero
Last reviewed
May 1, 2026
Sources
26 citations
Review status
Source-backed
Revision
v1 ยท 3,992 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 1, 2026
Sources
26 citations
Review status
Source-backed
Revision
v1 ยท 3,992 words
Add missing citations, update stale details, or suggest a clearer explanation.
GPTZero is a commercial [[ai_content_detector|AI content detector]] that estimates the probability that a given block of text was written by a [[llm|large language model]] such as [[chatgpt|ChatGPT]], [[gpt-4|GPT-4]], [[claude|Claude]], [[gemini|Gemini]], or Llama, rather than by a human. It was built in late December 2022 and posted publicly in early January 2023 by Edward Tian, then a senior at Princeton University, and it became the first AI text detector to attract mainstream attention. By 2024 GPTZero claimed roughly 4 million registered users, even as a growing body of academic research questioned whether AI text detection can be made reliable.
The product runs as a hosted web application at gptzero.me, with browser extensions, an API, integrations with learning management systems (Canvas, Google Classroom, Blackboard, Moodle), and a Google Docs writing-replay add-on called Origin. The company is based in New York and Toronto and raised about $13.5 million across a 2023 seed round and a 2024 Series A. Multiple peer-reviewed studies have shown GPTZero can be defeated by simple paraphrasing, exhibits bias against non-native English writers, and sometimes flags historical texts (including parts of the [https://senseient.com/ride-the-lightning/ai-detector-believes-the-u-s-constitution-was-written-by-ai/ United States Constitution] and the King James Bible) as machine-written.
GPTZero was developed by Edward Tian, an undergraduate at Princeton University concentrating in Computer Science and pursuing a certificate in Journalism. Tian wrote the first version during winter break of his senior year, working from a coffee shop in Toronto. He posted a beta on a [https://etedward-gptzero-main-zqgfwb.streamlit.app/ Streamlit web app] on January 2, 2023, and announced it on Twitter the next day. The tweet went viral and accumulated several million views in under a week.
NPR's Emma Bowman ran one of the earliest national stories on the tool on January 9, 2023, framing it as a student-built response to ChatGPT, which had launched only weeks earlier. The Washington Post followed on January 12, 2023, and within two weeks the tool had been covered by the New York Times, Wall Street Journal, BBC, CNN, the Daily Beast, Forbes, and CBC. In the first week, GPTZero received about 30,000 uses and crashed; Streamlit allocated additional server resources to keep the service online. The dedicated domain gptzero.me went live later in January 2023.
Tian had previously interned on synthetic data research at Microsoft AI and as an investigative journalism intern at the BBC. He co-founded the company with Alex Cui, a machine-learning engineer who joined as CTO, with Yazan Mimi as an early collaborator.
The company raised a $3.5 million seed round on May 16, 2023, co-led by Uncork Capital and Neo, with angel investors including Jack Altman (brother of [[openai|OpenAI]] CEO Sam Altman), Stability AI's Emad Mostaque, former Reuters CEO Tom Glocer, and former New York Times CEO Mark Thompson. GPTZero closed a $10 million Series A in June 2024 led by Footwork VC, with participation from Reach Capital, Uncork, Neo, and Alt Capital. Total disclosed funding stood at about $13.5 million as of mid-2024, with the company describing itself as profitable.
The original version of GPTZero ran two simple statistical checks against text submitted by the user, on the theory that autoregressive language models leave consistent statistical signatures in their output. The two original signals are still the public-facing description of the system, even though the company says the production model now combines several layers of analysis on top of them.
Perplexity measures how well a language model predicts a given sequence of tokens; in information-theoretic terms it is the exponential of the cross-entropy. Low perplexity means the next token is, on average, easy for the model to guess.
GPTZero's argument is that text written by a large language model tends to have lower perplexity under another large language model than text written by a human. An autoregressive sampler picks high-probability continuations most of the time, so the resulting prose is more predictable than what a person would write. GPTZero computes per-sentence perplexities and aggregates them across the document.
Burstiness captures how much perplexity itself varies across a document. Human writing tends to be "bursty": a complicated sentence next to a short blunt one, an aside, then a longer paragraph. Language models, especially earlier ones, produce a more uniform stream of medium-length sentences. Higher burstiness suggests human authorship; flatter writing suggests a model.
GPTZero combines its per-sentence perplexity and burstiness signals into a document-level score and returns one of several classifications, typically "Human," "Mixed," "AI," or "All AI," along with a probability. The interface highlights specific sentences it considers most likely to have been generated, which is intended to help reviewers focus on suspect passages rather than the entire document. The company also exposes a numeric "perplexity" and "burstiness" pair on some interfaces.
The production model has been retrained repeatedly to keep up with changes in the underlying language models. GPTZero says it currently supports detection across [[gpt-4|GPT-4]], GPT-4o, GPT-5, [[claude|Claude]] 3.5 and Claude 4, [[gemini|Gemini]] 2 and 3 family models, Llama 3 family models, and Mistral, in addition to legacy GPT-3.5 and ChatGPT outputs. English remains the strongest language; other languages vary, with shorter texts of fewer than about 250 words noted by the company itself as harder to classify reliably.
The following table summarizes the statistical signatures GPTZero relies on, in the form most often cited in the company's documentation.
| Signal | What it measures | What GPTZero infers |
|---|---|---|
| Perplexity | Negative log-likelihood of each token under a reference language model, averaged per sentence | Lower average values suggest LLM authorship |
| Burstiness | Variance of per-sentence perplexity across the document | Lower variance (flatter) suggests LLM authorship |
| Sentence-level scores | Probability assigned to each individual sentence | Used to highlight suspect passages |
| Document-level score | Aggregated probability across all sentences | Returned as Human, Mixed, AI, or All AI |
| Origin typing analysis (optional) | Edit history, copy-paste counts, revision rate from Google Docs | Used as evidence of human authoring process |
GPTZero is offered as a freemium SaaS product. There is a free tier with a per-month word cap, paid individual tiers that lift the cap and unlock features such as plagiarism scanning, an Educator tier, and an Enterprise tier with API access, single sign-on, and admin dashboards.
Pricing is updated periodically; the rough structure as of 2025 is summarized below from the GPTZero pricing page and third-party reviews.
| Plan | Approximate cost | Approximate word allowance | Notable features |
|---|---|---|---|
| Free | $0 | About 10,000 words per month | Basic AI scan, document highlighting |
| Essential | About $10 to $15 per month | About 150,000 words per month | Higher word cap, source scanning |
| Premium | About $16 to $24 per month | About 300,000 words per month | Plagiarism check, deep scan, all Essential features |
| Professional | About $45 to $46 per month | About 500,000 words per month | Higher cap, additional analytics |
| Educator and Enterprise | Custom | Custom | LMS integrations, admin console, API, single sign-on |
In September 2024, GPTZero launched Origin, a Chrome extension and Google Docs add-on that records a user's writing process: copy-paste events, edit history, typing rate, and revision durations. Rather than detecting AI in the final text, Origin produces a "writing replay" that lets a teacher or editor scrub through how a document came together. A paper with a smooth, plausible edit history is more credible than one that appeared in a single paste event. Origin is integrated with the GPTZero scoring system, so teachers can view the AI score and the writing replay together.
GPTZero ships official integrations or partnerships with several systems used in schools and workplaces.
| Integration | Notes |
|---|---|
| Canvas (Instructure) | Scores appear inside SpeedGrader alongside student submissions |
| Google Classroom | Direct integration via Scaffold |
| Google Docs | Origin add-on for writing-process replay |
| Blackboard, Moodle, D2L, Schoology | Provided through partner K16 Solutions |
| API | Available on paid tiers for embedding scoring into custom workflows |
| Microsoft Word | Add-in for in-document scoring |
| Chrome extension | Scans webpages and documents in browser |
The company says its tools are used by more than 380,000 educators and over 100 organizations spanning education, publishing, hiring, and legal services.
GPTZero is used in a handful of distinct contexts, each with very different stakes.
Education. This is the largest segment by user count. K-12 teachers, college instructors, and writing centers run student work through GPTZero directly or via Canvas and Google Classroom. The American Federation of Teachers announced a partnership with GPTZero in October 2023.
Publishing and journalism. Editors at content-heavy publishers use GPTZero or competitors to vet submissions from freelance writers. The promise to "preserve human journalism" featured in GPTZero's seed-round press release.
Hiring. Recruiters scan cover letters, application essays, and writing samples for AI-generated content. TechCrunch noted in 2024 that hiring managers had become a notable user segment.
Compliance and forensics. Legal teams and internal investigators use AI detectors as one of many signals when reviewing documents whose provenance is in dispute. These users tend to be more cautious about treating any single detector score as evidence.
Research and journalism investigations. Researchers have run GPTZero on corpora ranging from Amazon book listings to political speeches; one Spanish university published a peer-reviewed study using GPTZero on elected politicians' speeches.
Since early 2023, several research groups have stress-tested GPTZero and similar detectors. The findings have been consistent: detection is fragile, easy to evade, and biased.
Vinu Sankar Sadasivan and colleagues at the University of Maryland published [https://arxiv.org/abs/2303.11156 "Can AI-Generated Text be Reliably Detected?"] (arXiv:2303.11156) in March 2023. The paper argued, both theoretically and empirically, that as language models approach human-quality output, the statistical distance between their text and human text shrinks toward zero, putting an upper bound on what any detector can achieve. The authors then introduced a recursive paraphrasing attack that degraded every detector tested, including GPTZero, OpenAI's classifier, and DetectGPT.
Kalpesh Krishna and collaborators presented [https://arxiv.org/abs/2303.13408 "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense"] at NeurIPS 2023. They trained an 11-billion-parameter paraphrase model called DIPPER and showed that running AI-generated text through it dropped DetectGPT's detection accuracy from 70.3 percent to 4.6 percent at a fixed 1 percent false-positive rate, with large drops for GPTZero and OpenAI's classifier as well. Their proposed defense, retrieval against a database of past generations, requires cooperation from the LLM provider, which is not possible for open models.
Weixin Liang and colleagues at Stanford published [https://arxiv.org/abs/2304.02819 "GPT detectors are biased against non-native English writers"] in Patterns in July 2023. They submitted essays from native and non-native English speakers (drawn in part from TOEFL test responses) to seven popular detectors, including GPTZero. The detectors classified more than half of TOEFL essays as AI-generated, with one tool flagging 98 percent. The same tools correctly classified more than 90 percent of essays written by US eighth-graders as human. The authors argued that detectors pick up on stylistic uniformity (a property both of LLMs and of writers using a constrained vocabulary) rather than on AI authorship per se.
Eric Mitchell and colleagues at Stanford released [https://arxiv.org/abs/2301.11305 DetectGPT] in January 2023, presented at ICML 2023. DetectGPT is a zero-shot detector that uses the curvature of the model's log-probability function under small perturbations of the text. Subsequent papers used DetectGPT as a baseline alongside GPTZero, showing that all of them degrade under paraphrasing attacks.
Independent benchmarking studies have reported false-positive rates for GPTZero and peer tools ranging from a few percent on clean academic prose to 30 to 50 percent on writing by non-native English speakers, students with disabilities, students using formulaic genres, or texts that resemble model training data.
A series of high-profile incidents made the limits of detection visible to non-technical audiences.
The United States Constitution. In April 2023, screenshots circulated showing GPTZero classifying parts of the US Constitution as "likely entirely written by AI." Edward Tian explained to Ars Technica that this happens because the Constitution appears repeatedly in LLM training data, so models learn to produce text that resembles it, and detectors trained on those models therefore flag the Constitution as AI-like. ZeroGPT produced similar results, claiming 92 percent of the Constitution was AI-written.
Religious texts. Selections from the King James Bible have similarly been flagged as AI-generated by GPTZero and other detectors, for the same training-data-overlap reason.
Student accusations. Multiple US universities and high schools have publicly reported cases in which students were falsely accused of submitting AI-generated work. The University of Texas at Arlington advised faculty against using AI detectors as primary evidence in academic-integrity cases.
Vanderbilt and Pittsburgh disabling Turnitin's AI detector. On August 16, 2023, Vanderbilt University announced it was disabling Turnitin's AI writing detection feature, citing a 1 percent false-positive rate that, applied to the 75,000 papers Vanderbilt had submitted in 2022, would have wrongly flagged about 750 student papers. The University of Pittsburgh and others followed. These decisions reinforced broader institutional skepticism of all AI text detectors.
OpenAI shutting down its own classifier. On July 20, 2023, OpenAI removed its own AI Text Classifier, launched in January 2023, citing a "low rate of accuracy." OpenAI's own evaluation found that the classifier correctly identified only 26 percent of AI-written text and incorrectly flagged 9 percent of human-written text. The shutdown is often cited as evidence that even the company building the underlying language models cannot reliably detect their output.
The AI-detection space crowded quickly. Most competitors take a similar statistical-classifier approach; a few try alternative paths such as cryptographic watermarking.
| Tool | Vendor | First release | Free tier | Detection approach | Notes |
|---|---|---|---|---|---|
| GPTZero | GPTZero (Edward Tian) | January 2023 | Yes | Perplexity and burstiness, plus learned classifier | Originated the consumer-facing AI-detection category |
| OpenAI AI Text Classifier | [[openai | OpenAI]] | January 2023 (discontinued July 2023) | Yes | Fine-tuned classifier |
| Originality.AI | Originality | November 2022 | No (paid only at launch) | Classifier plus plagiarism check | Marketing-content focus |
| Copyleaks AI Detector | Copyleaks | 2023 | Limited | Classifier across 30+ languages | Plagiarism heritage |
| Turnitin AI Writing Detection | Turnitin | April 4, 2023 | Bundled with Turnitin | Classifier integrated with similarity check | Disabled by some universities in 2023 |
| Crossplag AI Detector | Crossplag | 2023 | Yes | Classifier | Bundled with plagiarism scanning |
| Sapling AI Detector | Sapling | 2023 | Yes | Classifier | Originally a writing-assistant company |
| Winston AI | Winston | 2023 | Limited | Classifier with claimed 99.98 percent accuracy | Heavy marketing claims, real-world numbers lower |
| ZeroGPT | ZeroGPT | 2023 | Yes | Classifier | Free, ad-supported; flagged the Constitution at 92 percent |
| Hugging Face GPT-2 Output Detector | Hugging Face/[[openai | OpenAI]] | 2019 | Yes | RoBERTa-based classifier trained on GPT-2 outputs |
| DetectGPT | Mitchell et al., Stanford | January 2023 | Research | Probability curvature, zero-shot | Academic baseline, not a product |
| SynthID-Text | [[google_deepmind | Google DeepMind]] | May 2024 announcement, October 2024 Nature paper | Open source | Cryptographic watermark embedded at sampling time |
GPTZero's strengths and weaknesses are mostly the strengths and weaknesses of statistical AI detection itself.
The free tier is genuinely usable, the interface is simple enough that a teacher with no technical background can use it, and the company has been aggressive about retraining the model as new LLMs are released. Per-sentence highlighting gives reviewers something to look at rather than a single opaque score, and Origin's writing-replay add-on is a meaningful step away from text-only classification, since edit history is much harder to fake than the final text. GPTZero also has the broadest set of LMS integrations of any AI detector aimed at education.
False positives are the central problem. They fall hardest on the people who can least afford them: non-native English writers, students with disabilities, and students who write in formulaic genres such as five-paragraph essays. Liang et al. (2023) showed that bias is measurable and large, not anecdotal.
Detection is also easy to evade. The Krishna and Sadasivan papers showed that paraphrasing AI-generated text, by hand or with another model, drops accuracy to near-random in many settings. A student who pastes ChatGPT output into another model and asks it to "rewrite this in my voice" will defeat most detectors with little effort.
Short texts are unreliable: under about 250 words, the perplexity and burstiness signals are noisy, and GPTZero itself notes lower confidence in this range. Long texts in unusual styles (academic, legal, literary, scriptural) are also problematic, as the US Constitution incident illustrated.
Finally, the underlying detection-vs-generation race is not in detectors' favor. Each new model release shifts the statistical signature, and detectors must be retrained to track it. As models close the gap with human writing, the signal-to-noise ratio of any classifier-based detector trends downward, which is the formal point Sadasivan et al. made.
A different line of work tries to bypass post-hoc detection by having the language model embed a hidden cryptographic watermark while generating text. Detection then becomes a verification problem rather than a classification problem.
The most prominent deployed example is [[google_deepmind|Google DeepMind]]'s SynthID-Text, announced in May 2024 and described in a Nature paper ("Scalable watermarking for identifying large language model outputs," October 23, 2024). SynthID-Text modulates the sampling distribution at generation time so that watermarked text contains a statistical pattern a verifier can check without access to the model's logits. DeepMind reports it has been integrated into Gemini, evaluated against millions of chatbot responses, and made open source. The Nature paper finds users do not notice a quality difference, and the watermark survives mild paraphrasing and translation but degrades under thorough rewriting.
Watermarks have one strong advantage over GPTZero-style detection: when present, they can be verified with very low false-positive rates. They have one strong limitation: they require the model provider to embed them. They cannot cover open-weight models that did not opt in, models run by adversaries, or text that has been thoroughly rewritten. OpenAI has reportedly built but not publicly deployed a watermarking system for ChatGPT outputs. Watermarks will cover only a fraction of the LLM ecosystem, leaving statistical detectors like GPTZero to fill the rest, with the limitations the academic literature has documented.
GPTZero matters less because it solved AI detection (it did not, and probably no tool will fully solve it) and more because it crystallized a public conversation. In January 2023, ChatGPT had been out for about a month, schools were panicking about student cheating, and there was no widely available tool to even attempt detection. A 22-year-old senior at Princeton built a Streamlit app over winter break, posted it on Twitter, and within a week had reframed how educators, journalists, and policymakers talked about AI in writing.
The tool also created a counter-discourse. The academic critiques, the Vanderbilt and Pittsburgh decisions, the OpenAI classifier shutdown, and the false-positive screenshots all happened in part because GPTZero was visible enough to be tested and criticized. A growing institutional consensus, by 2024 and 2025, holds that AI detection is at best a soft signal and should not be used as the basis for high-stakes academic-integrity decisions.
GPTZero continued to evolve through 2024, 2025, and into 2026. The company says its model has been retrained for GPT-4o, GPT-5, Claude 3.5 and Claude 4, Gemini 2 and Gemini 3, and Llama 3 and later models. The Origin extension expanded into a broader Chrome extension that scans webpages, and the company added an institution-level analytics dashboard for schools. The Series A funded expansion into hiring and publishing as additional verticals.
At the same time, the institutional environment has hardened. Universities including Vanderbilt and Pittsburgh have continued to advise against using any AI detector as primary evidence in academic-integrity cases, and several major publishers have begun requiring author affidavits about AI use rather than relying on detection. GPTZero's own messaging has shifted toward "verifying human writing" through tools like Origin, rather than "detecting AI," reflecting both the academic critique and the practical limits of post-hoc statistical classification.