11 June 2026 · 9 min read
How to detect AI writing in student work: a practical guide

Detecting AI writing has become one of the most contested problems in education. Every term, more teachers describe the same experience: a polished submission that arrives too easily, reads too cleanly, and bears no visible signs of the struggle that real thinking produces. The instinct to check is reasonable. The tools most people reach for first are not.
The two categories of detection signal
Output signals - things you can observe in the finished text - are what most AI detection tools measure. Does the prose read fluently? Are sentences predictably structured? Is the vocabulary a little formal for the apparent writer? These questions have value as starting points, but as evidence they're fragile. Language models are trained to produce text that reads like human writing, and they're getting better at it. A tool that scores prose for AI-likeness is chasing a moving target.
Process signals are different. They describe how work was created: whether it was typed or pasted, how long the session lasted, whether there were pauses consistent with thinking, whether content appeared in bursts or arrived all at once. These signals are not about the words - they're about the behaviour that produced them. That distinction matters enormously, both for accuracy and for fairness.
Why text-based AI detectors fall short
The most widely used text-based detectors work by measuring perplexity - roughly, how surprising each word choice is relative to what a language model would predict. Human writers make unpredictable choices; AI output tends toward statistical expectation. In principle, this should work. In practice, it doesn't work reliably enough for high-stakes educational decisions.
A Stanford study published in 2023 tested seven popular detectors against essays written by non-native English speakers and found false positive rates reaching 61%. The same tools flagged native speakers far less often. The reason isn't mysterious: ESL and EFL learners often use simpler, more predictable vocabulary and more conservative sentence structures - not because they're using AI, but because they're still building fluency. The very patterns that flag AI-generated text also flag careful, formal, non-native writing.
Several UK universities quietly stopped using text-based detectors after receiving complaints from international students who had been wrongly flagged. The Equality Act 2010 creates real liability for institutions whose processes produce discriminatory outcomes. A detection tool that flags your ESL cohort at significantly elevated rates is exactly the kind of tool that creates them.
What process detection looks at
Behavioural AI detection tools record the timeline of writing events during a session - not the text itself, just the events. A keystroke happened. A paste event of 340 characters occurred at minute two. The window lost focus for eleven minutes. Typing resumed at a rate consistent with composing, not transcribing.
This process fingerprint is more reliable than text analysis for several reasons. It's language-neutral: the rhythm of genuine composing looks the same whether you're writing in English, Spanish, or Mandarin, so the ESL false-positive problem doesn't arise. It's harder to fake: replicating the natural variance of human typing - the pauses, the corrections, the uneven cadence - requires active effort and technical knowledge most students don't have. And it produces a different kind of evidence. Not 'this prose reads like AI output' (a probabilistic claim), but 'this work arrived in a 2,400-character paste at minute three of a six-minute session' (a timestamped record of what actually happened).
What genuine writing looks like
Genuine writing has recognisable characteristics. Sessions run long relative to word count: a 500-word essay typically takes thirty-five to sixty minutes for a student who's actually thinking. Typing comes in bursts with pauses - moments where input stops whilst the writer thinks or rereads. Content grows incrementally. Deletions and edits appear throughout, not only at the end.
AI-assisted shortcuts often look different. Common patterns: a session that lasts only a few minutes for a lengthy submission; a single large paste event that accounts for most of the word count; typing that shows an unusually even rhythm, consistent with someone reading text onto a screen rather than composing; and minimal revision or deletion activity. None of these signals is definitive on its own - a student who drafted in a separate app will show a large paste event. But together they identify submissions that warrant a closer look.
Building a practical detection approach
The most defensible approach combines process collection with conversation. Set homework through a tool that captures writing behaviour - this gives you process data before you need it. Review submissions that show the most unusual patterns. Use what you find as the basis for a process-focused conversation with the student: asking them to walk you through how they approached the task is almost always more revealing than any detector score.
When students know their writing process is captured, the behaviour changes. Most who would otherwise have used a shortcut choose not to, because the shortcut is now visible. This preventive effect is as valuable as the detection function itself.
What to do with what you find
If process data shows strong anomalies - a very short session, a single large paste, minimal revision - it's enough to prompt a conversation, not enough to trigger formal proceedings on its own. Use it as one input among several: is the submission inconsistent with the student's previous work? Can they explain their process in specifics?
If they can walk you through the reasoning, the choices they made, the parts that were hard - even if the session data looked unusual - you probably don't have a misconduct case. If the session data is anomalous and the student cannot reconstruct the work at all, you have stronger grounds. Document what you have before escalating. A process log is far better evidence than a detector score, and in any formal context, that difference matters enormously.
Try Learnaway with your next homework