11 June 2026 · 9 min read

How to detect AI writing in student work: a practical guide

Students using a laptop to work on an assignment together — Photo by Zen Chung via Pexels

Detecting AI writing has become one of the most contested problems in education. Every term, more teachers describe the same experience: a polished submission that arrives too easily, reads too cleanly, and bears no visible signs of the struggle that real thinking produces. The instinct to check is reasonable. The tools most people reach for first are not.

The two categories of detection signal

Output signals - things you can observe in the finished text - are what most AI detection tools measure. Does the prose read fluently? Are sentences predictably structured? Is the vocabulary a little formal for the apparent writer? These questions have value as starting points, but as evidence they're fragile. Language models are trained to produce text that reads like human writing, and they're getting better at it. A tool that scores prose for AI-likeness is chasing a moving target.

Process signals are different. They describe how work was created: whether it was typed or pasted, how long the session lasted, whether there were pauses consistent with thinking, whether content appeared in bursts or arrived all at once. These signals are not about the words - they're about the behaviour that produced them. That distinction matters enormously, both for accuracy and for fairness.

Why text-based AI detectors fall short

The most widely used text-based detectors work by measuring perplexity - roughly, how surprising each word choice is relative to what a language model would predict. Human writers make unpredictable choices; AI output tends toward statistical expectation. In principle, this should work. In practice, it doesn't work reliably enough for high-stakes educational decisions.

A Stanford study published in 2023 tested seven popular detectors against essays written by non-native English speakers and found false positive rates reaching 61%. The same tools flagged native speakers far less often. The reason isn't mysterious: ESL and EFL learners often use simpler, more predictable vocabulary and more conservative sentence structures - not because they're using AI, but because they're still building fluency. The very patterns that flag AI-generated text also flag careful, formal, non-native writing.

Several UK universities quietly stopped using text-based detectors after receiving complaints from international students who had been wrongly flagged. The Equality Act 2010 creates real liability for institutions whose processes produce discriminatory outcomes. A detection tool that flags your ESL cohort at significantly elevated rates is exactly the kind of tool that creates them.

What process detection looks at

Behavioural AI detection tools record the timeline of writing events during a session - not the text itself, just the events. A keystroke happened. A paste event of 340 characters occurred at minute two. The window lost focus for eleven minutes. Typing resumed at a rate consistent with composing, not transcribing.

This process fingerprint is more reliable than text analysis for several reasons. It's language-neutral: the rhythm of genuine composing looks the same whether you're writing in English, Spanish, or Mandarin, so the ESL false-positive problem doesn't arise. It's harder to fake: replicating the natural variance of human typing - the pauses, the corrections, the uneven cadence - requires active effort and technical knowledge most students don't have. And it produces a different kind of evidence. Not 'this prose reads like AI output' (a probabilistic claim), but 'this work arrived in a 2,400-character paste at minute three of a six-minute session' (a timestamped record of what actually happened).

What genuine writing looks like vs AI-assisted shortcuts

Genuine writing has a recognisable process fingerprint. Understanding the contrast is easier with a direct comparison:

Behaviour signal	Genuine writing	AI-assisted shortcut
Session duration for 500 words	35-60 minutes typical	Often under 10 minutes
Typing pattern	Irregular bursts with pauses (thinking time)	Even cadence consistent with transcribing pre-written text
Paste events	Occasional small pastes (name, quotes)	One large paste covering most of the word count
Revision activity	Edits scattered throughout the session	Minimal deletions; near-final text arrives early
Content growth	Gradual and incremental	Rapid jump to near-final length after paste
Focus events	Some tab-switches to research	Long gaps or no activity outside the submission

None of these signals is definitive alone - a student who drafted in a separate document will show a large paste event, and that's a legitimate workflow. It's the combination of multiple anomalous signals in the same submission that identifies work worth a closer look.

Building a practical detection approach

The most defensible approach combines process collection with conversation. Set homework through a tool that captures writing behaviour - this gives you process data before you need it. Review submissions that show the most unusual patterns. Use what you find as the basis for a process-focused conversation with the student: asking them to walk you through how they approached the task is almost always more revealing than any detector score.

When students know their writing process is captured, the behaviour changes. Most who would otherwise have used a shortcut choose not to, because the shortcut is now visible. This preventive effect is as valuable as the detection function itself.

What to do with what you find

If process data shows strong anomalies - a very short session, a single large paste, minimal revision - it's enough to prompt a conversation, not enough to trigger formal proceedings on its own. Use it as one input among several: is the submission inconsistent with the student's previous work? Can they explain their process in specifics?

If they can walk you through the reasoning, the choices they made, the parts that were hard - even if the session data looked unusual - you probably don't have a misconduct case. If the session data is anomalous and the student cannot reconstruct the work at all, you have stronger grounds. Document what you have before escalating. A process log is far better evidence than a detector score, and in any formal context, that difference matters enormously.

Try Learnaway with your next homework

Set an assignment free Live demo

Why AI detectors flag ESL students as cheaters - and how teachers can avoid it
Text-based AI detectors are systematically biased against non-native English writers. Here's the research, the legal risk, and a fairer detection approach.
How to tell if students used ChatGPT: a fair approach
Text-based AI detectors are unreliable and unfair to non-native writers. Here's a fairer, more defensible approach based on the writing process.
How to talk to a student you suspect used AI on an assignment
Accusations can backfire badly. Here's how to turn AI suspicion into a fair, evidence-based conversation - and when to leave it at that.