9 June 2026 · 7 min read

How to check if text is AI-generated: what actually works

Hands typing on a laptop keyboard — Photo by Szabó Viktor via Pexels

The question sounds straightforward but the answer isn't. Can you reliably tell whether a piece of text was written by AI? Not with the confidence you'd need for a formal accusation. But you can do better than guessing, and understanding what the available tools actually measure helps you use them appropriately rather than over-confidently.

Why this is harder than it looks

Large language models are trained on enormous amounts of human-written text. Their output is, by design, intended to resemble human writing. As models have improved, the resemblance has become closer - which means the distinguishing signals have become weaker. A tool that worked reasonably well against early model output may struggle significantly with more recent generations.

There's also an asymmetry in the error costs. Missing an AI-generated submission is bad; wrongly accusing a student who wrote their work honestly can be considerably worse. Any tool for checking AI-generated text needs to be evaluated not just for its detection rate but for its false positive rate - and in practice, these two figures trade off against each other in ways vendors don't always communicate clearly.

How text-based detection tools work

Most AI content detectors you'll encounter operate on a principle called perplexity. They measure how predictable each word choice is relative to what a language model would generate. AI output tends toward higher predictability; human writing tends toward more variation. Some tools add a second measure called burstiness - the variation in predictability across different parts of the text. These metrics provide a statistical signal, but not a definitive one.

A separate approach involves AI watermarking, where the generating model embeds a detectable statistical signature in its output. This is more reliable when it works, but requires the generating model to support watermarking and is defeated by any post-processing that paraphrases the output. It's not yet widely deployed in the tools teachers encounter.

What free tools are actually good for

Several free AI detection tools exist for individual use - GPTZero, various browser extensions, and free tiers of commercial products among them. These are useful for a quick first-pass, but carry important caveats. Published accuracy claims typically reflect performance on controlled benchmark datasets, not on the real-world mixed-effort content teachers encounter. For non-native English writers, false positive rates are substantially higher than headline figures suggest.

For occasional, low-stakes checks, these tools are a reasonable starting point. For anything that could inform a formal judgement about a student, they are not sufficient evidence on their own. The output of any text-based detector should be treated as one input into a broader assessment, never as a verdict.

Process-based detection: the more reliable approach for education

For educational contexts specifically, the most reliable form of AI text detection doesn't read the text at all. It examines how the text was produced: the session timeline, paste events, typing cadence, and revision patterns that together constitute the writing process. This process fingerprint is language-neutral, harder to fake, and produces evidence that's considerably more defensible in formal proceedings.

Tools that capture this process data do so during the submission itself. The teacher sees a timeline - when typing happened, when paste events occurred and how large they were, when focus was lost - without ever seeing what the student actually typed. It's a different category of information from text analysis, and in educational contexts, a more useful one.

Using the signals you have

Whatever approach you use, the appropriate way to use AI detection signals is as a starting point for enquiry, not a basis for automatic action. A high score from a text-based tool, or an anomalous process record, means: ask the student about their process. It does not mean: conclude that misconduct occurred.

Start with the signals, end with the conversation. A conversation in which the student cannot account for their own work, combined with documented process anomalies, is far stronger evidence than a detector score alone. The strength of any misconduct case depends on the quality of the evidence.

Try Learnaway with your next homework

Set an assignment free Live demo

How to detect AI writing in student work: a practical guide
Text-based AI detectors are unreliable and unfair to ESL writers. A better approach examines how work was written, not what it says - here's a method that holds up.
How to check if an essay was written by AI
Running essays through a text detector has a serious accuracy problem. Here's a more reliable, fairer approach based on how the essay was actually written.
Why behavioural AI detection is more accurate than text analysis
Text-based AI detectors are chasing a moving target. Behavioural detection measures something models can't easily mimic - and it's fairer to ESL students too.