GPT Detector
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v4 · 2,253 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v4 · 2,253 words
Add missing citations, update stale details, or suggest a clearer explanation.
| GPT Detector | |
|---|---|
![]() | |
| Information | |
| Name | GPT Detector |
| Platform | ChatGPT |
| Store | GPT Store |
| Model | GPT-4 |
| Category | ?????? |
| Description | ChatGPT Detector can identify AI-generated text produced by a wide range of large language models, including ChatGPT, Bard, and GPT-4. It is fast and easy to use! |
| Developer | chatgptbuddy.com |
| OpenAI URL | ?????? |
| Chats | 7K |
| Web Browsing | Yes |
| DALL·E Image Generation | Yes |
| Code Interpreter | Yes |
| Free | Yes |
| Available | No |
| Updated | 2024-01-24 |
GPT Detector is a Custom GPT for ChatGPT listed in the GPT Store. It was published by the developer chatgptbuddy.com and built on top of GPT-4. The chatbot positioned itself as a quick way to check whether a passage was written by a large language model such as ChatGPT, Google Bard, or GPT-4, returning a verdict inside the regular ChatGPT conversation interface rather than on a separate website. According to its store listing it accumulated roughly 7,000 chats, and the tool is no longer publicly accessible in the GPT Store.
GPT Detector belongs to a wave of small, single-purpose Custom GPTs that appeared after OpenAI released the GPT Builder in November 2023 and the GPT Store in January 2024. The store opened to ChatGPT Plus, Team, and Enterprise subscribers on January 10, 2024, by which point users had already published more than three million Custom GPTs.[^1] Many of these early entries wrapped a familiar workflow, in this case AI text detection, in a conversational shell so that users could run a check without leaving ChatGPT.
The tool's stated mission was to flag AI-written passages from a wide range of generators. The store description names ChatGPT, Bard (Google's chatbot, later renamed Gemini), and GPT-4 as covered models, suggesting the detector was tuned to recognize stylistic fingerprints common across modern instruction-tuned transformers. Because Custom GPTs run on the underlying ChatGPT model rather than a dedicated classifier, GPT Detector's verdicts ultimately reflected GPT-4's own judgment about its writing rather than a separate trained discriminator.
The infobox lists the tool's enabled features:
| Feature | Status |
|---|---|
| Web Browsing | Yes |
| DALL·E Image Generation | Yes |
| Code Interpreter | Yes |
| Free to use | Yes |
| Underlying model | GPT-4 |
Web browsing meant the GPT could fetch external pages, useful for checking text that the user supplied as a URL rather than pasted directly. Code Interpreter (now called Advanced Data Analysis) allowed the GPT to run Python in a sandbox, which could be used to perform basic statistical checks on the text such as counting tokens or sentence lengths. DALL·E image generation is unusual for a detection tool and was probably enabled by default in the builder rather than because image creation was central to the workflow.
The single conversation starter exposed by the GPT was:
Clicking this prompt opened a session in which the user was expected to paste a passage of text. The GPT then returned a probability or verbal judgment about whether the passage was likely human-written or AI-generated.
GPT Detector is one entry in a much larger field of AI text detection. Most public detectors fall into one of three families.
The most widespread approach scores text on two metrics, perplexity and burstiness. Perplexity measures how surprised a reference language model is by a given passage; a model reads each token in turn and scores how unexpected it was given the prior context.[^2] Human writing tends to include more unusual word choices, idioms, and unexpected turns, producing higher perplexity. Output from large language models, which are trained to predict likely next tokens, tends to be smoother and more predictable, producing lower perplexity.
Burstiness measures variation in sentence-level perplexity and length across a document. Human writing usually mixes short, punchy sentences with longer, looping ones, producing high burstiness. Model output tends to settle into uniform sentence shapes, producing low burstiness.[^2] GPTZero, which uses perplexity and burstiness as the first layer of a seven-component model, is the best-known detector built on this idea.
A second family fine-tunes a discriminator model on labeled examples of human and AI text. The earliest example was OpenAI's GPT-2 Output Detector, released in 2019 alongside the 1.5B-parameter GPT-2 weights. It fine-tuned a RoBERTa-base model on outputs from GPT-2 and reported around 95 percent detection accuracy on that specific generator.[^3] OpenAI's later, broader AI Text Classifier, released on January 31, 2023, used a similar supervised approach trained on text from 34 generators across five organizations.[^4] Originality.ai, Copyleaks, and Turnitin's AI writing indicator all use proprietary fine-tuned classifiers in production today.
A third family does not require labeled training data. DetectGPT, introduced by Eric Mitchell and colleagues at Stanford in January 2023 (arXiv:2301.11305), observes that text sampled from an LLM tends to occupy negative-curvature regions of the model's log-probability function. The method perturbs a candidate passage with small rewrites from a generic mask language model such as T5, scores both the original and the perturbations under a reference LLM, and compares the two. If the perturbations have substantially lower log probability than the original, the passage looks machine-generated. The original paper reported that DetectGPT improved AUROC on 20-billion-parameter GPT-NeoX text from 0.81 to 0.95 over the strongest zero-shot baseline.[^5]
A later refinement, Binoculars (Hans et al., arXiv:2401.12070, ICML 2024), scores text by contrasting two closely related language models. The authors reported detecting more than 90 percent of ChatGPT samples at a false positive rate of 0.01 percent across a wide range of document types, without any training on ChatGPT data.[^6]
A fourth approach moves the burden from detection to generation. In "A Watermark for Large Language Models" (arXiv:2301.10226), John Kirchenbauer and co-authors proposed selecting a randomized set of "green" tokens before each word is generated and softly promoting their use during sampling. The watermark is invisible to humans but detectable from a short span of tokens using a statistical test that produces interpretable p-values. Detection does not require access to the model's API or parameters, and follow-up work showed that the signal often survives human or machine paraphrasing because n-grams from the original tend to leak through.[^7]
GPT Detector launched into a market already dominated by purpose-built websites. The major reference points include:
| Tool | Launched | Notes |
|---|---|---|
| GPT-2 Output Detector | 2019 | Fine-tuned RoBERTa, released by OpenAI alongside GPT-2's 1.5B weights. Reported around 95% detection on GPT-2 output.[^3] |
| Originality.ai | November 2022 | Founded by Jon Gillham and Conor Watt to flag GPT-3 content for web publishers, before ChatGPT was released.[^8] |
| GPTZero | January 2023 | Built by Princeton senior Edward Tian over a few days in a Toronto coffee shop. The launch tweet on January 2, 2023 amassed millions of views; the company later raised over $3.5 million in seed funding.[^9] |
| OpenAI AI Text Classifier | January 31, 2023 | Reported 26% true positive rate on AI text and 9% false positive rate on human text. Withdrawn on July 20, 2023 "due to its low rate of accuracy."[^4][^10] |
| Turnitin AI writing detection | April 2023 | Integrated into the existing plagiarism platform that millions of educators already used. Turnitin claimed 98% accuracy at launch with under 1% false positives, while opting to miss some AI rather than risk false positives.[^11] |
| Copyleaks AI Content Detector | 2023 | Claims over 99% accuracy and 0.2% false positive rate. A July 2023 study on arXiv ranked it as the most accurate detector among those tested.[^12] |
| ZeroGPT | 2023 | Free web tool aimed at general users; commonly cited alongside GPTZero in lists of accessible detectors. |
No current detector reaches 100 percent accuracy, and accuracy on long, unedited model output usually falls when text is paraphrased, lightly edited, or written in a domain-specific register.
OpenAI's own AI Text Classifier is the most public admission of how hard the problem is. The launch announcement noted that the classifier correctly flagged 26 percent of AI-written text as "likely AI-written" and incorrectly flagged 9 percent of human text the same way, both numbers far below what a deployable tool needs.[^4] Less than six months later, on July 20, 2023, OpenAI quietly added a notice to the launch post saying the classifier was "no longer available due to its low rate of accuracy" and committed to working on provenance methods for audio and visual content instead.[^10]
A July 2023 study by James Zou and colleagues at Stanford, published in the journal Patterns and posted to arXiv as 2304.02819, ran seven popular detectors over 91 TOEFL essays written by Chinese teenagers and a control set of essays by U.S. eighth graders.[^13] The detectors performed nearly perfectly on the U.S. essays but flagged about 61 percent of the TOEFL essays as AI-generated. All seven detectors unanimously labeled 19 percent of the TOEFL essays as AI text, and at least one detector flagged roughly 97 percent of them. The authors traced the bias to perplexity-based scoring, which penalizes the simpler vocabulary and sentence patterns common in non-native English writing. The Markup separately documented cases of international students at U.S. universities being accused of cheating on the basis of these detectors.[^14]
GPTZero published a response in 2024 reporting that retraining on educational data had reduced its false positive rate on the same TOEFL set to 1.1 percent, but the underlying tradeoff between perplexity-style features and writing style remains a known weakness of the detection field.[^15]
GPT Detector and similar Custom GPTs are not separate models. They route the user's text through GPT-4 with a system prompt asking the model to judge whether the passage is AI-generated. That puts them in a strictly weaker position than dedicated detectors:
For casual checks where the user just wants a second opinion, a Custom GPT can still be useful. For high-stakes decisions, especially in academic settings, the literature strongly recommends purpose-built detectors and human judgment together rather than a chatbot wrapper.
The infobox records GPT Detector as no longer available, with a last update date of January 24, 2024, just two weeks after the GPT Store opened. Many small Custom GPTs from this period have since been removed from the store, either by their developers or because they did not meet OpenAI's review policies. The store has steadily consolidated around larger, branded entries from publishers and platform companies, and standalone detection tools have largely migrated to dedicated websites.
Users looking for an in-ChatGPT detection workflow today can still ask GPT-4 directly to evaluate a passage, but the same caveats apply. For workflows that need defensible numbers, dedicated detectors such as GPTZero, Originality.ai, Turnitin AI writing detection, or Copyleaks remain the more common reference points.