What It Means
The basic unit of text that AI models process. A token is typically a word or part of a word - "writing" is one token, "unbelievable" might be split into "un," "believ," and "able." AI models process, generate, and price their services in tokens rather than words.
Why writers should care: When someone says a model "can handle 100,000 tokens," they're describing the amount of text it can process at once - roughly 75,000 words.
In Context
When an AI writes, it generates one token at a time - each new piece chosen based on all the pieces that came before. A single article might involve thousands of these sequential predictions. Understanding tokenization helps writers grasp why detection tools focus on statistical patterns: every token choice leaves a trace of the probability distribution that produced it, creating a subtle signature that detectors try to read.
Related Terms
- AI Detection - Software that attempts to determine whether a piece of text was written by a human or generated by an artificial intelligence.
- Algorithmic Bias - Systematic errors in AI systems that produce unfair outcomes for certain groups.
- Burstiness - A measure of how much variation exists in the complexity and length of sentences within a piece of writing.
- C2PA - The Coalition for Content Provenance and Authenticity - an open standard for certifying the origin and history of digital content.
- Content Provenance - The documented history of a piece of content from its creation through every edit, save, and publication.