11 June 2026 · 7 min read
Why AI detectors flag ESL students as cheaters - and how teachers can avoid it
If you teach students whose first language isn't English, you've inherited a problem you didn't ask for. Text-based AI detectors - tools that score how 'AI-like' a piece of writing reads - are disproportionately likely to flag the work of non-native and ESL writers as machine-generated. The gap isn't marginal, and it doesn't disappear with better prompt engineering or student guidance.
Why text-based detectors fail non-native writers
These tools work by measuring predictability. Large language models tend to choose the most statistically expected next word; human writers deviate from that pattern in ways that feel natural and varied. But not all human writers deviate the same way. ESL and EFL learners often use more predictable vocabulary, simpler sentence structures, and conservative phrasing - not because they're using AI, but because they're still building fluency in the language.
Learners writing in a formally learned register face an even sharper mismatch. Academic writing conventions - topic sentences, transitional phrases, formal vocabulary - are precisely the patterns these detectors associate with generated text. A student who has carefully studied academic English is, by design, writing in the style that flags highest.
What the research actually says
A widely cited 2023 study from Stanford found that seven popular AI detectors flagged essays by non-native English speakers as AI-generated at rates up to 61%. The same texts written by native speakers were flagged far less often. Subsequent research has replicated the finding across different detectors, language backgrounds, and proficiency levels.
The implication is stark: in a mixed class, text-based AI detection is systematically more likely to flag your ESL students than your native-English students - regardless of who actually used AI.
The fairness and legal exposure
A false accusation of academic dishonesty can carry serious consequences: a failed mark, a formal record, or in some cases expulsion. For international students whose visa status is tied to their academic standing, the stakes are especially high. Several UK and US universities have quietly suspended AI detector use after receiving complaints from non-native students, and formal legal challenges are beginning to appear.
In the UK, the Equality Act 2010 places a duty on schools and universities to ensure their processes don't produce discriminatory outcomes. A tool that flags ESL students at three times the rate of native speakers creates exactly the liability that duty is designed to prevent.
Read the process, not the prose
The root problem is a category error. Text detectors ask 'does this prose look AI-written?' - but the question teachers actually need answered is 'did this student write it?' Those are not the same question, and prose style is a poor proxy for the second one.
A behavioural approach asks instead: how was this work composed? Genuine drafting has a recognisable rhythm - typing in bursts, pausing to think, deleting and revising, activity spread plausibly over time. AI-assisted shortcuts look different: a large block pasted in at once, a 600-word essay completed in five minutes, a robotically even typing cadence consistent with transcription.
These process signals are language-neutral. A student writing in their fourth language, choosing simpler words and shorter sentences, produces the same process fingerprint as any other genuine writer. Their prose may look different; the way they composed it doesn't. This is why Learnaway analyses timing events and paste behaviour rather than reading the words at all - the analysis can't produce an English-proficiency bias because it never sees the text.
What to do in practice
If you're still using a text-based detector, treat any flag as a starting point for a conversation, not a verdict. Ask the student to talk you through their process: which parts were hard, where they got stuck, what they changed. Genuine engagement with the work almost always surfaces quickly.
If you're evaluating tools, ask vendors directly about false positive rates for non-native writers. A tool that can't answer that question - or that only cites headline accuracy on English-first benchmark datasets - is a tool that hasn't grappled with the problem.
For new assignments, consider collecting work through a process-aware tool from the start. The evidence you'd actually want for a fair judgement - the writing timeline, paste events, time-on-task - exists before the question of AI use ever arises, and it's the same evidence regardless of the student's language background.
Try Learnaway with your next homework