Skip to content
Detection Guide

How AI Detectors Actually Work

The statistics, the assumptions, and the fundamental limitations - explained for writers

How AI Detectors Actually Work

You have probably heard the terms: perplexity, burstiness, token probability, zero-shot classification. They appear in news articles about AI detection, in university policy documents, and in the marketing materials of detection companies. But what do they actually mean? And more importantly, what do they tell you about why these tools get it wrong?

This guide explains the statistical foundations of AI detection in plain language. You do not need a computer science degree to understand it. You do need to understand it if you want to know why a machine might decide your writing isn't human.

The Core Idea

AI language models work by predicting the next word in a sequence. Given the phrase "the cat sat on the," a model assigns probabilities to every possible next word. "Mat" gets a high probability. "Refrigerator" gets a low one. "Quantum" gets a very low one. When a model generates text, it selects words based on these probabilities, which is why AI text tends to flow smoothly - each word is, by design, a statistically likely successor to the one before it.

AI detectors exploit this predictability. They run a piece of text through a language model and measure how "surprised" the model is by each word. If the model is rarely surprised - if the text closely follows the patterns it would itself produce - the detector assigns a high probability that the text is AI-generated. If the model is frequently surprised - if the text contains unexpected word choices, unusual constructions, or creative departures from statistical norms - the detector assigns a low probability.

Perplexity: The Surprise Metric

Perplexity is the formal measure of how surprised a language model is by a piece of text. Low perplexity means the text is predictable - each word follows naturally from the last. High perplexity means the text contains surprises.

AI-generated text tends to have low perplexity because it was produced by the same kind of statistical process the detector is using to evaluate it. Human text tends to have higher perplexity because humans make creative choices that don't follow statistical norms.

The problem: some human writers naturally produce low-perplexity text. Technical writers, journalists trained in clarity, non-native English speakers writing carefully in their second language, and academic writers following disciplinary conventions all tend to produce prose that is clear, predictable, and statistically similar to machine output. These writers are flagged not because their writing is artificial, but because it is precise.

The tools do not measure whether a human wrote something. They measure whether the text surprises a statistical model.

Burstiness: The Rhythm Metric

Burstiness measures variation in sentence complexity. Human writing tends to be "bursty" - a long, complex sentence followed by a short, punchy one. A fragment. Then another flowing passage. This rhythmic variation is a natural consequence of human thought, which doesn't proceed at a constant pace.

AI-generated text tends toward uniformity. Sentences are similar in length and complexity, producing a flat, even rhythm. Detectors use low burstiness as a signal that text may be machine-generated.

Again, the problem is specificity. Some human writers - particularly those trained in technical or academic disciplines - write with consistent sentence structures. Their prose is not flat because it is artificial; it is consistent because their training emphasized clarity and uniformity.

Zero-Shot vs. Trained Classifiers

There are two broad approaches to detection. Zero-shot detectors use the statistical properties described above - perplexity and burstiness - to make their assessments without being trained on specific examples. Trained classifiers, by contrast, are machine learning models that have been trained on datasets of known human and known AI text, learning to recognize patterns that distinguish the two.

Trained classifiers tend to be more accurate in controlled settings. But they have a critical weakness: they can only detect AI text that resembles their training data. When a new model is released with different output characteristics, trained classifiers must be retrained. In the gap between release and retraining, accuracy drops.

Why They Fail

AI detectors fail for reasons that are fundamental, not incidental. They are measuring statistical properties of text, not the process that created it. A human writer who happens to produce statistically "normal" text will be flagged. An AI output that has been lightly edited by a human - introducing enough surprises to raise perplexity - will pass. The tools measure the wrong thing, and they measure it imperfectly.

This doesn't mean detection is useless. Used carefully, as one input among many in a human review process, detection tools can identify text that warrants closer examination. The danger is in treating their output as conclusive - as evidence rather than indication. That distinction is the difference between a tool and an oracle, and it is the distinction that too many institutions have failed to make.

For a deeper look at specific detectors, see our detector comparison or our analysis of how Turnitin's AI detection works. For the other side of the equation - the tools designed to evade these detectors - see our best AI humanizer tools review.


EV

Dr. Elena Vasquez

Dr. Elena Vasquez bridges the gap between technical AI research and public understanding. She consults with universities on fair use policies and writes accessible guides for non-technical audiences.

The Sunday Letter

Every Sunday, one email. A featured essay, a case study update, a craft tip, and a writing prompt. No AI wrote this.