12 June 2026 · 7 min read
AI writing checker: what to look for and what to ignore

An AI writing checker promises something appealing: a reliable way to tell whether a piece of text was written by a person or generated by a machine. What most of them deliver is something slightly different - a score that reflects how statistically predictable the text is, which correlates with AI generation but isn't the same as detecting it. Understanding that distinction changes how you should interpret and use the output.
What the score actually measures
Most AI writing detection tools - GPTZero, Originality.ai, Copyleaks, Winston AI, and most others - work by measuring perplexity (how surprising each word choice is relative to what a language model would predict) and burstiness (whether that predictability varies sentence to sentence). AI-generated text tends to be low-perplexity and low-burstiness: the model chooses statistically expected words consistently throughout.
Human writing shows different patterns: occasional unexpected word choices, more variation in sentence entropy, natural unpredictability that emerges from genuine thought. The checker flags text as likely AI-generated when it looks too predictable, too consistent. The score isn't 'this was written by ChatGPT' - it's 'this text looks like the kind of language a model would produce'.
The false positive problem
The core issue with perplexity-based checking is that formal, careful writing often looks similar to AI-generated text. A student who has learned to write well in a formal academic register - using standard transitional phrases, conventional sentence structures, appropriate vocabulary - is going to trigger higher AI-likelihood scores than a student who writes more loosely.
This problem is especially acute for non-native English speakers. A language learner who has studied academic English specifically to use formal conventions correctly produces text that pattern-matches closely with AI output. Research has repeatedly found false positive rates around 50-60% for this population on commonly used tools. If your class includes international students or strong ESL learners, a high checker score should carry very significant uncertainty.
Signals worth paying attention to
A high AI-likelihood score from a writing checker, on its own, isn't sufficient grounds for any formal action. What makes it more meaningful is the combination with other signals. The same score on a submission from a student who typically produces rough, colloquial, mistake-filled writing raises questions worth exploring. On an ESL student's carefully constructed essay, it's a different matter entirely.
The contextual signals that matter most aren't in the checker's output: they're in your knowledge of the student's typical performance, the plausibility of the submission given the time available, whether the work engages specifically with your course content or reads generically, and - where you have it - data on how the work was produced.
Signals to ignore
The absolute score is often less meaningful than the tool's framing suggests. A '78% AI probability' score looks precise but carries large confidence intervals that aren't displayed. Specific 'AI-generated sentences' highlighted in the output should be treated sceptically too - individual sentences flagged as AI-generated are the output of a classifier working within tight accuracy limits.
Checker tools that claim near-perfect accuracy should be treated with caution. Independent evaluations consistently find accuracy rates lower than vendor claims, particularly on mixed-origin content and non-native writer text. Vendors have an obvious incentive to headline their best-case accuracy; independent results from real educational contexts are a more reliable guide.
Using writing checkers well
The most productive use of an AI writing checker is as a triage tool, not a verdict tool. Run submissions through the checker after you've already identified those you want to look at more carefully. Use the output to focus your attention rather than to make decisions. If you flag a submission for follow-up based partly on a checker score, the follow-up should involve asking the student to explain their process - not presenting them with the score as evidence.
For schools building systematic integrity processes, combining a writing checker with a process-based tool produces considerably better evidence than either alone. Process data - writing session duration, paste behaviour, typing patterns - is language-neutral and factual. Text analysis is probabilistic and language-sensitive. Used together, they provide complementary signals that support better-informed professional judgement.
Try Learnaway with your next homework