AI content detectors
Last reviewed
May 30, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 · 2,901 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 30, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 · 2,901 words
Add missing citations, update stale details, or suggest a clearer explanation.
AI content detectors are software tools that try to estimate whether a piece of text, an image, or a video was produced by an artificial intelligence system rather than a human. The category grew quickly after the public release of ChatGPT in late 2022, when schools, publishers, and employers began looking for ways to tell machine writing apart from human writing. Most attention has focused on AI text detection, which tries to flag prose generated by a large language model such as ChatGPT, Claude, or Gemini.1
The central problem with these tools is that they are not reliable. Peer-reviewed studies have found that AI text detectors produce significant numbers of false positives, are biased against people who do not write English as a first language, and can be defeated by simple paraphrasing.234 OpenAI, which builds some of the most widely used language models, launched its own text classifier in January 2023 and then withdrew it in July 2023, citing a "low rate of accuracy."5 Several universities disabled the AI writing detector built into Turnitin for similar reasons.6 In 2025 the United States Federal Trade Commission ordered one detection vendor to stop advertising an accuracy figure it could not support.7 For these reasons, a broad institutional consensus by the mid-2020s holds that detector output is at best a weak signal and should not be used as proof in high-stakes decisions.
This article focuses on AI text detection, which is the largest and most contested part of the field, and covers image and deepfake detection more briefly.
An AI content detector takes a sample of content and returns a judgment about its origin, usually as a probability or a label such as "human," "AI," or "mixed." The label is a prediction, not a measurement, because there is no physical trace in ordinary text that marks it as machine-written. A detector is inferring authorship from statistical patterns, which is fundamentally different from a tool that reads a hidden signal deliberately placed in the content.
The field splits into three loosely related areas:
All three share the same underlying difficulty. As generative models improve, the statistical gap between synthetic output and authentic content shrinks, so the signal a detector relies on gets weaker over time.3
Most AI text detectors fall into one of two technical families, and a separate approach based on watermarking sits apart from both because it requires cooperation from the model that produced the text.
The earliest consumer detectors, including the first version of GPTZero, were built on two statistical measures.1
Perplexity measures how predictable a passage is to a reference language model. In information-theoretic terms it is the exponential of the cross entropy, so low perplexity means the next word is, on average, easy to guess. The argument is that text sampled from a language model tends to have lower perplexity than human text, because the sampler usually picks high-probability continuations.
Burstiness measures how much perplexity varies across a document. Human writing tends to mix long complex sentences with short blunt ones, which produces high variance, while earlier models produced a flatter, more uniform stream. Higher burstiness suggests a human author; flatter writing suggests a model.
These signals are intuitive but fragile. They depend on the reference model matching the model that wrote the text, they get noisy on short passages, and they can flag human writing that happens to be uniform or formulaic.
The second family trains a machine-learning classifier on large collections of human-written and AI-written text, then applies it to new samples. An early public example was the RoBERTa-based detector that OpenAI and Hugging Face released for GPT-2 output in 2019. Modern commercial detectors generally use trained classifiers, often combined with the statistical signals above and with stylometric features such as sentence structure, vocabulary, and punctuation habits.1
Classifier detectors are only as good as their training data. A model trained mostly on one kind of writing, or on the output of one language model, can perform poorly on other genres or on newer models. The FTC's 2025 action against Workado turned on exactly this point: the underlying model had been trained on academic text and on ChatGPT output, yet the product was marketed as a general-purpose detector.7
A detector tries to guess authorship after the fact. Watermarking takes a different route: the language model embeds a hidden statistical pattern into the text while it generates it, so verification later becomes a matter of checking for that pattern rather than guessing. AI text detection and watermarking are therefore distinct techniques that are often confused.
The most prominent deployed example is SynthID-Text from Google DeepMind, described in a paper published in Nature on 23 October 2024. SynthID-Text modulates the model's sampling distribution at generation time so that watermarked text carries a signal a verifier can detect without access to the model. DeepMind reported integrating it into Gemini, evaluating it across roughly 20 million chatbot responses without users noticing a quality difference, and releasing the code openly.8 OpenAI has separately said it built a text watermarking method for ChatGPT that is "highly accurate" and even effective against paraphrasing, but as of August 2024 had not released it, citing concerns that it could be circumvented by translation or rewording, could stigmatize non-native English speakers, and was unpopular in user surveys.9
The key limitation of watermarking is that it only works when the model provider chooses to embed the watermark. It cannot cover open-weight models that did not opt in, text from systems run by people trying to avoid detection, or content that has been thoroughly rewritten. Watermarks will therefore cover only part of the AI text ecosystem, leaving the rest to unreliable statistical detectors.
The AI text detection market filled rapidly through 2023. Most products take a classifier-based approach; vendor accuracy claims are typically measured in controlled tests on clearly AI-generated text and tend to be much higher than independent results on real-world writing. The figures below are attributed to their sources rather than presented as established fact.
| Tool | Vendor | Released | Approach | Notes on claims |
|---|---|---|---|---|
| GPTZero | GPTZero (Edward Tian) | January 2023 | Perplexity and burstiness, plus a learned classifier | First consumer AI detector to attract mainstream attention1 |
| OpenAI AI Text Classifier | OpenAI | January 2023, withdrawn July 2023 | Fine-tuned classifier | OpenAI reported it correctly flagged only 26% of AI text and wrongly flagged 9% of human text on a challenge set510 |
| Turnitin AI writing detection | Turnitin | April 2023 | Classifier integrated with similarity checking | Turnitin advertised 98% confidence and a false-positive rate under 1% from lab testing, then acknowledged higher real-world false positives1112 |
| Originality.ai | Originality.ai | November 2022 | Classifier plus plagiarism check | Vendor cites about 99% accuracy; independent tests report lower, with measurable false positives13 |
| Copyleaks | Copyleaks | 2023 | Classifier across many languages | Vendor cites about 99% accuracy; a Bloomberg test reported a 1% to 2% false-positive rate on pre-AI essays14 |
| Winston AI | Winston | 2023 | Classifier | Markets a 99%-plus accuracy figure that independent reviewers do not reproduce on general text13 |
| Hugging Face GPT-2 Output Detector | Hugging Face and OpenAI | 2019 | RoBERTa-based classifier | Open source, mainly of historical interest |
| Workado AI Content Detector | Workado | early 2020s | Classifier | FTC settlement in 2025 over a 98% claim that independent testing put at about 53% on general content7 |
GPTZero was built in late December 2022 by Edward Tian, then a senior at Princeton University, and became the first AI text detector to reach a wide audience.1 OpenAI's classifier is the most cited cautionary tale: the company that builds the underlying models could not make a reliable detector for their output, and it pulled the tool within six months.5
The evidence that AI text detectors are unreliable is strong, comes from multiple independent groups, and has held up as the underlying models have changed.
A false positive, flagging human writing as AI, is the most damaging failure mode because it can lead to a wrongful accusation. Even small percentages matter at scale. A Bloomberg test of GPTZero and Copyleaks on 500 essays written before generative AI existed found false-positive rates of 1% to 2%; Bloomberg noted that applied across all student essays this would translate to large numbers of wrongful flags.14 OpenAI's own classifier wrongly labeled 9% of human text as AI.5 Turnitin claimed a false-positive rate under 1% at launch but acknowledged in mid-2023 that real-world false positives were higher than that figure.12
The most rigorous evidence of bias comes from a Stanford study led by Weixin Liang, published in the journal Patterns in 2023. The authors ran essays through seven popular detectors. The detectors misclassified more than half of essays written by non-native English speakers (drawn from TOEFL test responses) as AI-generated, with an average false-positive rate of 61.22%, while they correctly classified essays by US eighth-graders as human almost every time. When the same TOEFL essays were edited to use more varied vocabulary, the false-positive rate fell sharply. The authors concluded that detectors largely react to low-complexity, low-perplexity writing rather than to AI authorship as such, which penalizes anyone with a more limited or constrained English style.2
Detectors are also easy to defeat on purpose. Vinu Sankar Sadasivan and colleagues at the University of Maryland argued in "Can AI-Generated Text be Reliably Detected?" (2023) that as a language model approaches human quality, even the best possible detector can do only marginally better than random guessing, and they showed a recursive paraphrasing attack that degraded every detector they tested.3 In a NeurIPS 2023 paper, Kalpesh Krishna and collaborators trained an 11-billion-parameter paraphrase model called DIPPER and reported that running AI text through it dropped one detector's accuracy from 70.3% to 4.6% at a fixed 1% false-positive rate, while also evading GPTZero, OpenAI's classifier, and watermarking.4 In practice, a person can paste model output into another model, ask it to rewrite the text, and defeat most detectors with little effort.
The builders of the models have said as much. OpenAI withdrew its classifier in July 2023 because of its "low rate of accuracy."5 When OpenAI later discussed watermarking, it framed text provenance as an unsolved, sensitive problem rather than a solved one.9 The 2025 FTC settlement with Workado put a regulator on record that a detector advertised as 98% accurate performed at roughly 53% on general content, and required the company to hold reliable evidence before making such claims.7
Education is the largest use case for AI text detection and the setting where its weaknesses matter most, because a false positive can mean a student is accused of cheating.
Many institutions have decided that the tools are not dependable enough for that purpose. On 16 August 2023, Vanderbilt University announced it was disabling the AI writing detector built into Turnitin. The university noted that Turnitin's claimed 1% false-positive rate, applied to the roughly 75,000 papers Vanderbilt submitted in 2022, would have wrongly flagged about 750 papers; that Turnitin did not disclose how the detector worked; and that detectors are more likely to flag non-native English speakers. Vanderbilt stated plainly: "we do not believe that AI detection software is an effective tool that should be used."6 The University of Pittsburgh's teaching center likewise disabled Turnitin's AI detection, and Southern Methodist University discontinued it at the end of 2023, among others.1516
The broader academic-integrity debate has moved toward treating detector output as one weak signal rather than evidence. The reasons are the ones above: high-stakes consequences, documented bias against particular groups of students, trivial evasion, and a lack of transparency about how scores are produced. Guidance from many universities now advises against using a detector score as the basis for a misconduct finding, and recommends process-based approaches such as drafts, version history, and conversations with students instead.
Image and deepfake detection face the same core problem as text detection and arguably a harder version of it, because the generators improve quickly and adversaries actively try to evade detection.
A 2025 benchmark called Deepfake-Eval-2024, which collected deepfakes actually circulating online in 2024, found that open-source detection models that score well on older academic datasets performed far worse on real-world media: area under the curve dropped by about 50% for video, 48% for audio, and 45% for image models compared with earlier benchmarks. Even after fine-tuning on real-world data, peak accuracy reached roughly 0.75 for video, 0.86 for audio, and 0.69 for images, and the best commercial video detector reached about 78%, still below the roughly 90% accuracy estimated for expert human analysts.17 These results underline that, as with text, laboratory accuracy figures overstate how well image and deepfake detectors work in the wild.
Cryptographic provenance is being pushed as a more durable answer here too. Watermarking schemes such as SynthID label AI-generated images, audio, and video at creation time, and content-provenance standards aim to attach signed metadata to media. As with text, these approaches only help when the generator participates, so they cannot cover content from systems that decline to mark their output.8
The near-term trajectory of AI content detection is shaped by a race the detectors are not well positioned to win. Each new generation of models narrows the statistical gap that post-hoc detectors depend on, which is the formal point of the impossibility argument in the academic literature.3 Watermarking is the most promising technical direction because, when a watermark is present, it can be verified with low false-positive rates; but it only covers models whose providers opt in, and it degrades under heavy rewriting, so it will never cover the whole ecosystem.89
The practical conclusion drawn by educators, regulators, and many researchers is consistent: AI content detectors can be a soft signal that prompts a closer look, but they are not reliable enough to serve as proof, and using them that way risks real harm, especially to non-native English writers and other groups they flag disproportionately.267
AI Wiki, "GPTZero." https://aiwiki.ai/wiki/gptzero Accessed 2026-05-31. ↩ ↩2 ↩3 ↩4 ↩5
Weixin Liang et al., "GPT detectors are biased against non-native English writers," Patterns, 2023. https://www.cell.com/patterns/fulltext/S2666-3899(23)00130-7 and arXiv:2304.02819 https://arxiv.org/abs/2304.02819 . Summary via Stanford HAI: https://hai.stanford.edu/news/ai-detectors-biased-against-non-native-english-writers Accessed 2026-05-31. ↩ ↩2 ↩3
Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, Soheil Feizi, "Can AI-Generated Text be Reliably Detected?," arXiv:2303.11156, 2023. https://arxiv.org/abs/2303.11156 Accessed 2026-05-31. ↩ ↩2 ↩3 ↩4
Kalpesh Krishna et al., "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense," NeurIPS 2023, arXiv:2303.13408. https://arxiv.org/abs/2303.13408 Accessed 2026-05-31. ↩ ↩2
TechCrunch, "OpenAI scuttles AI-written text detector over 'low rate of accuracy'," 25 July 2023. https://techcrunch.com/2023/07/25/openai-scuttles-ai-written-text-detector-over-low-rate-of-accuracy/ Accessed 2026-05-31. ↩ ↩2 ↩3 ↩4 ↩5
Vanderbilt University, "Guidance on AI detection and why we're disabling Turnitin's AI detector," 16 August 2023. https://www.vanderbilt.edu/brightspace/2023/08/16/guidance-on-ai-detection-and-why-were-disabling-turnitins-ai-detector/ Accessed 2026-05-31. ↩ ↩2 ↩3
U.S. Federal Trade Commission, "FTC order requires Workado to back up artificial intelligence detection claims," April 2025. https://www.ftc.gov/news-events/news/press-releases/2025/04/ftc-order-requires-workado-back-artificial-intelligence-detection-claims . Coverage with figures: CyberScoop, "Workado settles with FTC over allegations it inflated its AI detectors' capabilities," 2025. https://cyberscoop.com/ftc-workado-settlement-ai-detector-false-and-misleading-claims/ Accessed 2026-05-31. ↩ ↩2 ↩3 ↩4 ↩5
Sumanth Dathathri et al., "Scalable watermarking for identifying large language model outputs," Nature, 23 October 2024. https://www.nature.com/articles/s41586-024-08025-4 . Coverage: MIT Technology Review, "Google DeepMind is making its AI text watermark open source," 23 October 2024. https://www.technologyreview.com/2024/10/23/1106105/google-deepmind-is-making-its-ai-text-watermark-open-source/ Accessed 2026-05-31. ↩ ↩2 ↩3
TechCrunch, "OpenAI says it's taking a 'deliberate approach' to releasing tools that can detect writing from ChatGPT," 4 August 2024. https://techcrunch.com/2024/08/04/openai-says-its-taking-a-deliberate-approach-to-releasing-tools-that-can-detect-writing-from-chatgpt/ Accessed 2026-05-31. ↩ ↩2 ↩3
OpenAI, "New AI classifier for indicating AI-written text," 31 January 2023 (updated July 2023 to note discontinuation). https://openai.com/index/new-ai-classifier-for-indicating-ai-written-text/ Accessed 2026-05-31. ↩
Turnitin, "AI writing detection model" guide. https://guides.turnitin.com/hc/en-us/articles/28294949544717-AI-writing-detection-model Accessed 2026-05-31. ↩
K-12 Dive, "Turnitin admits there are some cases of higher false positives in AI writing detection tool," 2023. https://www.k12dive.com/news/turnitin-false-positives-AI-detector/652221/ Accessed 2026-05-31. ↩ ↩2
Reporting on independent AI-detector benchmarks (Scribbr 2024 and others) summarizing that vendor accuracy claims of roughly 98% to 99% are not reproduced on general real-world text. https://medium.com/@jericho.w.blanco/most-accurate-ai-detectors-i-tried-them-all-2025-update-9cb863c0a9f9 Accessed 2026-05-31. ↩ ↩2
Bloomberg test of GPTZero and Copyleaks on 500 pre-AI essays reporting a 1% to 2% false-positive rate, as summarized in Northern Illinois University Center for Innovative Teaching and Learning, "AI detectors: an ethical minefield," 12 December 2024. https://citl.news.niu.edu/2024/12/12/ai-detectors-an-ethical-minefield/ Accessed 2026-05-31. ↩ ↩2
University of Pittsburgh, University Center for Teaching and Learning, "Encouraging academic integrity." https://teaching.pitt.edu/resources/encouraging-academic-integrity/ Accessed 2026-05-31. ↩
Southern Methodist University, OIT, "Changes to Turnitin AI detection tool at SMU," 13 December 2023. https://blog.smu.edu/itconnect/2023/12/13/discontinue-turnitin-ai-detection-tool/ Accessed 2026-05-31. ↩
Nuria Alina Chandra et al., "Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024," arXiv:2503.02857, 2025. https://arxiv.org/abs/2503.02857 Accessed 2026-05-31. ↩