Burstiness

See also: AI text, AI content and AI content detectors

Burstiness (also called text burstiness, word burstiness, or simply burstiness) is a statistical property of a sequence that measures how unevenly events, tokens, or sentence lengths are distributed across that sequence. In writing, it captures the extent to which short and long sentences alternate, how often a word reappears in clumps once introduced, and how much variation exists from one passage to the next. The concept moved from corpus linguistics and information theory into the mainstream when GPTZero, launched in January 2023 by Princeton undergraduate Edward Tian, used burstiness together with perplexity as the two headline signals of its AI detection model. Since then burstiness has become one of the most widely cited heuristics for telling apart human writing from text produced by a large language model.

The core intuition is simple. Human prose tends to be uneven. People mix a clipped four-word sentence with a winding thirty-word one, repeat a topical noun several times in one paragraph and then drop it for pages, and shift register without warning. Models trained on next-token prediction tend to regress toward the average. Their sentences cluster around a typical length, their vocabulary use is smoother, and the per-token surprisal varies less from one sentence to the next. Burstiness tries to put a number on that difference.

Definition and intuition

In the AI detection setting, burstiness is usually framed as the variance of perplexity (or sentence length) across the document, while perplexity itself is the average per-token unpredictability of the text under a reference language model. A text can have high average perplexity but still look machine-written if every sentence has roughly the same perplexity, because that flatness is itself a fingerprint of statistical generation. GPTZero summarises the relationship by saying that burstiness compares perplexity across sentences and that human text is more discontinuous than model output.

A loose ranking helps:

High burstiness, high perplexity: most likely human.
Low burstiness, low perplexity: most likely model-generated.
Mixed scores: ambiguous, often the result of editing, paraphrasing, or non-native English writing.

The second category is what detectors are looking for. A typical untouched ChatGPT reply tends to land there because the decoding process favours statistically average continuations, and average continuations produce sentences of similar length and similar surprisal.

Origins in linguistics and information theory

The statistical idea is older than the AI detection use case by several decades. It first appears in corpus linguistics around the question of why simple bag-of-words models fail. If words were sprinkled independently across documents, their counts would follow a Poisson distribution. They do not. Topical content words such as boycott, pope, or kennedy arrive in clumps: once a word appears in a document, the probability of seeing it again jumps far above the baseline rate. Kenneth Church and William Gale formalised this in their 1995 paper Poisson Mixtures and in companion work on inverse document frequency, where they showed that word counts are much better captured by a negative binomial distribution, equivalent to a Poisson with a Gamma-distributed rate parameter, than by a single Poisson.

This observation drives a lot of practical natural language processing. Term-frequency weighting, inverse document frequency, and topic models all have to cope with the fact that words are bursty. In 2005 Rasmus Madsen, David Kauchak, and Charles Elkan introduced the Dirichlet compound multinomial (DCM), also called the multivariate Polya distribution, as an alternative to the multinomial in text classification and clustering. The DCM adds one degree of freedom that lets a model say "this word, once seen, is more likely to appear again in this document," which corrects much of the burstiness problem and produces measurably better perplexity on standard document collections.

A parallel line of work in physics and complex systems studies burstiness as a property of event sequences in time. Albert-Laszlo Barabasi's 2005 Nature paper The origin of bursts and heavy tails in human dynamics argued that human activity timing, things like email, library visits, and printing jobs, follows non-Poisson statistics, with short stretches of intense activity separated by long quiet periods. Three years later K.-I. Goh and Barabasi proposed a compact way to quantify that pattern, now known as the Goh-Barabasi burstiness parameter B. Although it was developed for inter-event times rather than text, the same parameter is sometimes adapted to sentence-length distributions in NLP because it shares the same intuitive scale.

Mathematical formulations

There is no single canonical formula for burstiness. Different fields use different measures, and AI detection tools usually do not publish their exact equations. The most common formulations are summarised below.

Measure	Formula	Range / interpretation	Typical use
Goh-Barabasi parameter B	(sigma - mu) / (sigma + mu)	-1 (perfectly regular) to +1 (extremely bursty); 0 corresponds to a Poisson process	Inter-event times, adapted to sentence lengths
Index of dispersion (Fano factor)	sigma^2 / mu	=1 for Poisson; >1 indicates bursty / overdispersed; <1 indicates regular	Counting processes, word counts
Coefficient of variation (CV)	sigma / mu	0 for constant series; rises as variance grows	General variability of sentence lengths or perplexities
Variance of per-sentence perplexity	Var(PPL_i) over sentences i	No fixed scale; higher means more bursty	AI text detection heuristics
Inverse participation ratio	sum(p_i^2) over normalized sentence lengths	Low = uniform, high = concentrated	Concentration measure borrowed from physics

In practice, most AI detection contexts collapse all of this into one rough question: how much do per-sentence statistics fluctuate compared to their mean? A document where every sentence is around twenty tokens with similar average log-probability scores will look low-burstiness regardless of which formula is plugged in. A document that swings between a five-word fragment and a forty-word complex sentence with a wide spread of per-token probabilities will look high-burstiness on every measure.

Why language models produce low-burstiness text

Decoder-only transformer models such as the GPT family generate text by sampling one token at a time from a probability distribution conditioned on what has been written so far. Even when temperature, top-p, and other decoding parameters are tuned to encourage diversity, the underlying objective is still next-token prediction, and it tends to regress toward statistically average behaviour at the sentence and paragraph level. Three reinforcing effects show up in the output:

Sentence lengths cluster around the model's training-distribution mean. Models rarely produce two-word sentences or runaway sixty-word ones unless the prompt specifically pushes them there.
Lexical variety is smoothed. The same general vocabulary is reused across paragraphs, with fewer of the topical bursts that human writers create when they discover a useful word and then rely on it.
Per-sentence surprisal is flat. Because each sentence is composed of the locally most probable tokens, the perplexity from one sentence to the next stays in a narrow band.

Reinforcement learning from human feedback and instruction tuning amplify the smoothing. Human raters tend to prefer clear, well-structured, on-topic answers, so models that have been fine-tuned on those preferences become even more uniform in tone and rhythm. The result is text that reads as competent but monochrome, which is exactly what burstiness measures are designed to flag.

Burstiness in GPTZero and other detectors

GPTZero is the tool most associated with the term in the public conversation. Edward Tian launched a prototype on 2 January 2023, weeks after ChatGPT's public release, and the system gained millions of users almost overnight as schools and universities scrambled to respond to AI-written assignments. GPTZero's documentation describes a statistical layer combining perplexity and burstiness as the first stage of detection, with subsequent layers added as the product matured. The service raised 3.5 million dollars in seed funding in May 2023 and a 10 million dollar Series A in 2024.

Other detectors use related but not identical signals. The table below summarises how a few well-known tools relate to burstiness. Exact algorithms are proprietary, so descriptions reflect public statements rather than published source code.

Detector	Burstiness role	Other primary signals	Notes
GPTZero	Headline statistic, computed alongside perplexity	Sentence-level classifier, paragraph scoring	Most widely cited use of the term
Originality.AI	Used as one feature among many	Supervised classifier trained on AI vs human text	Markets to publishers and SEO professionals
OpenAI AI Text Classifier	Implicitly captured in classifier features	Fine-tuned GPT classifier	Released January 2023, withdrawn 20 July 2023 due to low accuracy
Turnitin AI detection	Internal statistical features	Sentence classification model	Targets education market
Winston AI, Copyleaks, ZeroGPT	Variable, often perplexity plus heuristics	Mixed classifier and statistical approaches	Quality and transparency vary

Most commercial detectors no longer rely on burstiness alone. The dominant approach today is a supervised classifier trained on large corpora of paired AI and human text, with statistical features such as perplexity and burstiness fed in as inputs alongside richer linguistic signals.

Limitations and the bias problem

Burstiness is a useful heuristic, not a reliable test. There are several well-documented failure modes.

False positives on simple, technical, or repetitive writing. Documents written for clarity, such as legal text, instructions, or formal reports, naturally tend toward uniform sentence length and predictable vocabulary, and they often score as low-burstiness even when no model was involved. Ars Technica and other outlets have shown that detectors flag the United States Constitution and similar canonical documents as AI-generated.

Bias against non-native English writers. The most cited critique is the 2023 study by Weixin Liang and colleagues at Stanford, GPT detectors are biased against non-native English writers, published in Patterns. The team tested several major detectors on TOEFL essays written by non-native English speakers and on academic writing samples. Detectors classified more than half of the non-native essays as AI-generated, while almost all native-speaker samples were correctly identified as human. The authors traced the effect to lower lexical and syntactic variety in non-native writing, which produces lower perplexity and flatter burstiness in the same way model output does.

Low standalone accuracy. When OpenAI shut down its own AI Text Classifier on 20 July 2023, it cited a true positive rate of only 26 percent for AI-written text and a false positive rate of 9 percent for human writing. A University of Maryland study and several follow-up evaluations have reached similar conclusions: no detector is reliable enough to be used as the sole basis for an accusation of academic dishonesty.

Easy evasion. Because burstiness measures variance, a writer can raise it artificially by mixing in short and long sentences, splicing fragments between long ones, and varying word choice. The Liang paper showed that simple prompting strategies, such as asking the model to rewrite the text "with literary flourish," pushed AI output past most detectors. Paraphrasing tools and so-called "humanisers" exploit the same weakness.

Sensitivity to length and editing. Burstiness scores are noisier on short passages, and they shift sharply when a human edits a model-generated draft or when a model polishes a human draft. The longer the document and the cleaner the source, the more reliable the signal, and conversely.

These limitations explain why teachers, journals, and platforms have generally moved away from treating any detector score as proof of authorship. The signal is real, but it is statistical and it can be wrong on either side.

Other applications of burstiness in NLP

The AI detection use case is the most public, but burstiness shows up in several other corners of natural language processing.

Information retrieval and term weighting. Standard TF-IDF schemes implicitly assume that document term counts follow a Poisson-like distribution. The DCM and similar bursty models give better scores when they replace the multinomial assumption inside language models for retrieval.

Topic modelling. Latent Dirichlet allocation and similar models inherit the multinomial assumption and therefore underestimate how often a topical word repeats. Researchers including Gabriel Doyle and Charles Elkan have proposed accounting for burstiness directly inside topic models to improve held-out perplexity.

Named-entity bursts in news streams. In streaming-text settings, the sudden burst of mentions of a name or place is itself a feature used to detect breaking events, trending topics, and emerging entities.

Speech and disfluency analysis. Bursty patterns in pause length, filler words, and repetition appear in spoken-language analysis and have been used in clinical NLP for studies of speech disorders.

How to write more bursty text

Writers who want their drafts to score as more human, whether they are revising AI output or simply trying to liven up flat prose, generally aim for the same handful of moves.

Vary sentence length deliberately. Drop in a four-word sentence after a long one. Let a fragment land. Then write a longer, more involved sentence that takes its time getting where it is going.
Use paragraph rhythm. A short paragraph between two long ones changes pace.
Repeat topical words in clusters when it suits the topic, then drop them. This recreates the natural burstiness of human vocabulary use.
Allow asides, parentheses, and minor digressions. Models tend to keep every sentence on-topic; people drift.
Edit at the sentence level rather than at the word level. Substitution-based paraphrasing often preserves the underlying rhythm of the source.

None of this guarantees that a detector will classify the result as human, but it does push the statistical fingerprint closer to typical human writing.

Explain like I'm 5 (ELI5)

Imagine you are listening to two friends tell you about their day. One friend talks in bursts. They say a lot all at once, then a quick "yeah" or "anyway," then a long story, then a very short joke. The other friend talks in steady, even sentences that all sound about the same length, like they are reading from a list. The bursty friend feels more like a person. The steady one feels a bit like a robot. Burstiness is just a way of measuring that difference in writing instead of speaking. Computers like GPTZero look at a piece of text, count how much the sentences bounce around, and use that to guess whether a person or an AI wrote it. The trick is that the test is not perfect: some people, especially when they are writing in a language that is not their first, also write in even sentences and get mistaken for robots.

References

Tian, E. "GPTZero: Perplexity and Burstiness, what is it?" GPTZero, 2023. https://gptzero.me/news/perplexity-and-burstiness-what-is-it/
Wikipedia contributors. "GPTZero." Wikipedia. https://en.wikipedia.org/wiki/GPTZero
Bowman, E. "A college student created an app that can tell whether AI wrote an essay." NPR, 9 January 2023. https://www.npr.org/2023/01/09/1147549845/gptzero-ai-chatgpt-edward-tian-plagiarism
Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., and Zou, J. "GPT detectors are biased against non-native English writers." *Patterns*, vol. 4, no. 7, 2023, article 100779. https://www.cell.com/patterns/fulltext/S2666-3899(23)00130-7
Liang, W., et al. Preprint. arXiv:2304.02819, 2023. https://arxiv.org/abs/2304.02819
OpenAI. "New AI classifier for indicating AI-written text." 31 January 2023, updated 20 July 2023. https://openai.com/index/new-ai-classifier-for-indicating-ai-written-text/
Wiggers, K. "OpenAI scuttles AI-written text detector over 'low rate of accuracy.'" TechCrunch, 25 July 2023. https://techcrunch.com/2023/07/25/openai-scuttles-ai-written-text-detector-over-low-rate-of-accuracy/
Church, K. W., and Gale, W. A. "Poisson mixtures." *Natural Language Engineering*, vol. 1, no. 2, 1995, pp. 163-190.
Church, K. W., and Gale, W. A. "Inverse Document Frequency (IDF): A Measure of Deviations from Poisson." In *Natural Language Processing Using Very Large Corpora*, 1995. https://aclanthology.org/W95-0110.pdf
Madsen, R. E., Kauchak, D., and Elkan, C. "Modeling Word Burstiness Using the Dirichlet Distribution." *Proceedings of the 22nd International Conference on Machine Learning (ICML)*, 2005. https://icml.cc/Conferences/2005/proceedings/papers/069_WordBursting_MadsenEtAl.pdf
Barabasi, A.-L. "The origin of bursts and heavy tails in human dynamics." *Nature*, vol. 435, 2005, pp. 207-211. https://www.nature.com/articles/nature03459
Goh, K.-I., and Barabasi, A.-L. "Burstiness and memory in complex systems." *Europhysics Letters*, vol. 81, no. 4, 2008, article 48002. https://arxiv.org/abs/physics/0610233
Kim, E.-K., and Jo, H.-H. "Measuring burstiness for finite event sequences." *Physical Review E*, vol. 94, 2016, article 032311. https://link.aps.org/doi/10.1103/PhysRevE.94.032311
Wikipedia contributors. "Burstiness." Wikipedia. https://en.wikipedia.org/wiki/Burstiness
Wikipedia contributors. "Index of dispersion." Wikipedia. https://en.wikipedia.org/wiki/Index_of_dispersion
Edwards, B. "Why AI writing detectors don't work." Ars Technica, 14 July 2023.
Doyle, G., and Elkan, C. "Accounting for Burstiness in Topic Models." *Proceedings of the 26th International Conference on Machine Learning*, 2009. https://cseweb.ucsd.edu/~elkan/TopicBurstiness.pdf

Definition and intuition

Origins in linguistics and information theory

Mathematical formulations

Why language models produce low-burstiness text

Burstiness in GPTZero and other detectors

Limitations and the bias problem

Other applications of burstiness in NLP

How to write more bursty text

Explain like I'm 5 (ELI5)

See also

References

Improve this article

Related Articles

Agentic Context Engineering

Claude Sonnet 4.5

Computer-use agent

Computer-use model

Context window

Meta Prompting

Definition and intuition

Origins in linguistics and information theory

Mathematical formulations

Why language models produce low-burstiness text

Burstiness in GPTZero and other detectors

Limitations and the bias problem

Other applications of burstiness in NLP

How to write more bursty text

Explain like I'm 5 (ELI5)

See also

References

Related Articles

Agentic Context Engineering

Claude Sonnet 4.5

Computer-use agent

Computer-use model

Context window

Meta Prompting