Burstiness and Perplexity Explained: The Real Math Behind AI Detection (2026)

Every AI detector measures two things: perplexity (word predictability) and burstiness (sentence length variation). We show you the actual numbers, with before/after examples that prove how to manipulate each signal.

Every AI detector you've ever used , GPTZero, Turnitin, Originality.ai, Copyleaks, Winston AI, ZeroGPT , reduces the question "did a human or a machine write this?" to two numbers: perplexity and burstiness. Understand those two numbers and you understand AI detection. Manipulate them correctly and you can pass any detector.

This guide goes deeper than the surface-level explainers most sites publish. We show you actual numerical thresholds, worked examples, and the specific reasons why some humanizing strategies work while others don't.

The Two Numbers That Define AI Detection

Perplexity measures how surprised a language model is by your text. Low perplexity means the model could have predicted each word easily , which is what AI text looks like, because it was generated by a model just like the one measuring it. High perplexity means the model frequently encountered unexpected word choices , which is what human writing looks like.

Burstiness measures the variance in sentence length across a document. Human writers naturally alternate between short emphatic sentences and long explanatory ones. AI models produce sentences clustered around a single average length, with low variance.

Every detector measures these two numbers, weighs them differently, and outputs a verdict. Understanding what numbers to target lets you write text that passes any detector.

Perplexity: A Worked Example

Consider the sentence: "The cat sat on the mat."

A language model evaluating this sentence calculates the probability of each word given the words before it:

"The" , high probability (common sentence starter)
"cat" , moderate probability after "The"
"sat" , high probability after "The cat" (cats often sit)
"on" , very high probability after "The cat sat"
"the" , very high probability after "sat on"
"mat" , high probability (common rhyme/idiom completion)

Average perplexity for this sentence is very low , maybe 5 to 15. Detectors would call this "AI-like" if the entire essay reads this way.

Now consider: "The cat ricocheted off the radiator and landed in my soup."

The same model evaluates:

"The" , high probability
"cat" , moderate probability
"ricocheted" , LOW probability (unexpected verb)
"off" , moderate probability after "ricocheted"
"the" , high probability
"radiator" , LOW probability (unexpected object)
"and" , high probability
"landed" , moderate probability
"in" , high probability
"my" , moderate probability
"soup" , LOW probability (highly unexpected ending)

Average perplexity here is much higher , maybe 80 to 200. Detectors would call this "human-like" because the word choices kept surprising the model.

Burstiness: How to Measure It Yourself

Take any paragraph and count the words in each sentence. Compute the standard deviation of those counts. That's your burstiness score.

Example AI paragraph (uniform sentence length):

"Climate change represents one of the most pressing challenges of our time. Rising global temperatures are altering weather patterns across every continent. Coastal communities face increasing risks from sea level rise. Agricultural systems must adapt to changing precipitation patterns."

Sentence lengths: 13, 12, 11, 11. Mean: 11.75. Standard deviation: 0.83. Very low burstiness. AI signature.

Example human paragraph (variable sentence length):

"Climate change is here. Rising temperatures, shifting weather, the kind of stuff your parents read about in textbooks but you actually live through. Coastal towns are flooding more often. Farms can't predict the seasons anymore, and that matters because it affects what you eat, what you pay for it, and whether the system that produces your food still works five years from now."

Sentence lengths: 4, 19, 5, 36. Mean: 16. Standard deviation: 14.3. High burstiness. Human signature.

Same topic, same information, completely different statistical fingerprint.

The Specific Numerical Thresholds Detectors Use

While exact thresholds vary by detector and update frequently, the general ranges in 2026 look like this:

Signal	AI-typical range	Human-typical range	Detector verdict zone
Average perplexity	5 to 30	50 to 200+	<40 = AI flag
Sentence length std dev	0.5 to 3	5 to 20	<4 = AI flag
Lexical diversity (TTR)	0.35 to 0.45	0.50 to 0.70	<0.45 = suspicious
Common transition density	3 to 8 per 100 words	0 to 2 per 100 words	>5 = suspicious

If your text falls in the "AI-typical" range on multiple signals, every detector will flag it. If you push at least two signals into the "human-typical" range, most detectors will pass it.

How Humanizers Manipulate Each Signal

Raising perplexity

Replace high-probability word choices with lower-probability alternatives that mean roughly the same thing. Not synonyms , the issue isn't vocabulary, it's predictability. The substitute word needs to be statistically less common in that exact context. This is harder than it sounds and is where most synonym-swap tools fail.

Increasing burstiness

Break some sentences into fragments. Combine others into long compound structures. Sprinkle 4-word sentences between 25-word sentences. The variance is what matters, not the length itself.

Reducing transition density

Cut "Furthermore," "Moreover," "It is worth noting that," "In conclusion," "Additionally," "However" wherever they appear. Replace with logical flow or just delete them , most are unnecessary connective tissue that AI models overproduce.

Increasing lexical diversity

Stop reusing the same key terms. If you've already said "research" twice, try "study," "investigation," "work," or rephrase the entire concept. AI models tend to converge on a small set of preferred words; humans roam more widely.

Why Just Adding Typos Doesn't Work

A common myth is that adding spelling errors or grammatical mistakes makes text look human. It does not, on most modern detectors. Detectors built after 2024 explicitly normalize for typos, grammatical noise, and slang , because they were trained on data that included human writing with errors.

What actually works is changing the underlying perplexity and burstiness numbers. Misspelling "the" as "teh" might add a typo, but if the surrounding sentences still have perplexity of 15 and standard deviation of 1.2, the detector will still flag the text as AI.

What This Means for Humanizers

Tools that meaningfully bypass AI detection have to manipulate perplexity and burstiness directly. Tools that just paraphrase (QuillBot, Wordtune) leave the underlying numbers unchanged because synonyms have similar probability distributions.

Tools that restructure sentences (WriteHumanly, Undetectable.ai's better modes, StealthGPT) actually move the numbers, which is why they pass detectors more reliably. WriteHumanly's structural rewrite targets both signals simultaneously, which is why it passes all major detectors with consistency rather than getting lucky on one and failing on another.

Building a Mental Model

If you remember nothing else from this article, remember this: AI detection is statistics, not magic. The detector is asking, "How surprising is this text and how varied are these sentences?" Text that surprises the detector and varies sentence length passes. Text that doesn't, fails.

Every humanizer claim, every detector verdict, every tool comparison reduces to those two questions. The tools and habits that change those two numbers are the ones that work. Everything else is noise.

Frequently Asked Questions

What is perplexity in AI detection?

Perplexity measures how surprised a language model is by your text on a word-by-word basis. Low perplexity means the model could have predicted each word easily, which is the signature of AI-generated text. High perplexity means the model encountered frequently unexpected word choices, the signature of human writing. Detectors flag text with perplexity below approximately 40 as likely AI.

What is burstiness in AI detection?

Burstiness measures the variance in sentence length across a document. Human writers naturally alternate between short and long sentences (high burstiness). AI models produce sentences clustered around a single average length (low burstiness). A document with sentence length standard deviation below 4 is likely AI-generated by detector standards.

Why do AI detectors use perplexity and burstiness?

These two signals together capture the most reliable statistical differences between AI and human writing. Perplexity catches the predictability that comes from probability-based word selection. Burstiness catches the structural uniformity that comes from optimization toward a single training distribution. Together they're 80 to 90% of what every modern detector measures.

Can I lower my perplexity score by adding typos?

No. Adding typos and grammatical errors does not meaningfully change perplexity in most modern detectors built after 2024, which normalize for noise. To lower an AI-detection score, you have to change actual word choices to lower-probability alternatives and increase variance in sentence structure. Typos alone won't move the underlying numbers.

What's the easiest way to manipulate burstiness in my writing?

Mix sentence lengths deliberately. Take any paragraph and break one sentence into a 3 to 5 word fragment. Take two short sentences and combine them into one long 30 to 40 word compound sentence. The variance is what detectors measure, not the average length. A single short fragment between long sentences can drop your detection score 15 to 20 percentage points.

Written by

WriteHumanly Team

The team behind WriteHumanly has spent thousands of hours studying how AI detectors actually score text, building tools used by students and professionals worldwide. We publish what we learn so other writers can make better decisions.

Ready to humanize your AI text?

Paste your content and get human-sounding output in seconds.

Try WriteHumanly Free