Originality.ai built its reputation catching GPT output. Claude (Anthropic's model) writes differently. We ran 10 essays through Originality.ai to see if Claude evades detection, with surprising results.
Originality.ai built its reputation detecting GPT-3.5 and GPT-4 output. But Claude (Anthropic's flagship model) writes differently than GPT, and detector accuracy doesn't always transfer between LLM families. If you're using Claude for writing assistance, the practical question is: will Originality.ai flag it?
We ran a controlled test in April 2026 to find out. The short answer is yes, but with significant gaps that are worth understanding. Here's exactly what we found.
The Test Setup
We generated 10 essays of approximately 500 words each across 5 different topics:
- Climate policy
- Personal narrative (childhood memory)
- Technical explanation (how blockchain works)
- Argumentative essay (universal basic income)
- Business case study
For each topic, we generated one essay using Claude Sonnet and one using GPT-4o, with identical prompts. We then submitted all 20 essays to Originality.ai's Lite Web Tool with the latest detector model and recorded the AI confidence score for each.
Results: Originality.ai on Claude vs GPT-4o
| Topic | GPT-4o score | Claude Sonnet score | Detection gap |
|---|---|---|---|
| Climate policy | 100% AI | 87% AI | 13 points |
| Personal narrative | 96% AI | 52% AI | 44 points |
| Technical explanation | 100% AI | 91% AI | 9 points |
| Argumentative essay | 99% AI | 78% AI | 21 points |
| Business case study | 100% AI | 83% AI | 17 points |
Originality.ai catches Claude output, but with consistently lower confidence than GPT-4o. The gap is widest on personal narrative content (44 percentage points), where Claude's slightly different stylistic patterns appear to confuse the detector enough to drop the score below the typical flagging threshold.
Why Claude Scores Lower on Originality.ai
Three structural reasons explain the consistent gap:
1. Different training data and architecture
Claude is trained on different data than GPT-4o, and Anthropic uses constitutional AI methods that produce slightly different output patterns. Originality.ai's detector was trained primarily on GPT outputs (the dominant model when the detector was built in 2022 to 2023), so its statistical fingerprints for "AI text" lean toward GPT patterns.
2. Slightly higher perplexity
Claude tends to produce text with marginally higher perplexity than GPT-4o on the same prompt. The word choices are slightly less predictable, which is exactly what AI detectors interpret as a "more human" signal.
3. More variable sentence structure
In our testing, Claude essays had higher burstiness (sentence length variance) than GPT-4o essays on the same prompt. The variance gap was small but consistent across all 5 topics, and burstiness is one of the two core signals every modern detector measures.
Does This Mean Claude Is "Safer" Than GPT?
Sort of. Claude output gets caught less reliably by Originality.ai, but "less reliably" doesn't mean "not at all." Confidence scores in the 78 to 91% range still trigger detector flags in most workflows. If you're submitting Claude output through any AI detection layer, expect it to be caught most of the time.
The practical implication is that Claude output requires less aggressive humanization to pass detectors than GPT output, but both still benefit from humanization for high-stakes submissions where false positives or missed flags carry real consequences.
How to Make Claude Output Pass Originality.ai
The same techniques that work for GPT output also work for Claude:
- Restructure sentences to vary length (raise burstiness)
- Replace high-probability word choices with less predictable alternatives
- Cut formulaic transitions ("Furthermore," "Moreover," "It is important to note")
- Add minor topic drift (a brief tangent that returns to the main point)
- Include idiomatic compressions ("kind of," "tbh," "no clue") where the register allows
Or run the output through a humanizer that handles all of this automatically. WriteHumanly's structural rewrite brings Originality.ai scores below 15% on Claude output reliably across our test set, with similar results on GPT-4o output.
What About Other Detectors?
We also tested the same 20 essays against GPTZero, Turnitin, and Copyleaks. The pattern was consistent across the board:
| Detector | Avg score on GPT-4o | Avg score on Claude | Gap |
|---|---|---|---|
| Originality.ai | 99% | 78% | 21 points |
| GPTZero | 96% | 74% | 22 points |
| Turnitin | 91% | 68% | 23 points |
| Copyleaks | 97% | 71% | 26 points |
Every detector tested showed a 20+ point gap between GPT and Claude output. Claude is consistently harder to detect across the board, not just on Originality.ai. This is meaningful but not transformative , a 71% Copyleaks score is still a flag.
The Bottom Line
Originality.ai detects Claude output reliably, but with lower confidence than GPT output. If you're using Claude for writing assistance and submitting work that will be checked by AI detectors, plan to humanize the output anyway. Claude's edge over GPT in detector evasion is real but modest, and shouldn't change your humanization workflow.
Frequently Asked Questions
Does Originality.ai detect Claude in 2026?
Yes. In our April 2026 testing, Originality.ai detected Claude Sonnet output with 78% average confidence across 10 essays. This is meaningful detection but consistently lower than the 99% confidence Originality.ai achieves on GPT-4o output. Claude output gets caught most of the time but with less certainty.
Is Claude harder to detect than ChatGPT?
Yes, marginally. Across GPTZero, Turnitin, Originality.ai, and Copyleaks, Claude output scored 20 to 26 points lower than GPT-4o output on the same prompts in our testing. Claude tends to produce text with slightly higher perplexity and burstiness than GPT, which AI detectors interpret as more human-like.
How do I make Claude output undetectable?
Apply the same humanization techniques that work for GPT output: vary sentence length, replace predictable word choices, cut formulaic transitions, add minor topic drift. A structural humanizer like WriteHumanly handles all of this automatically and brings Originality.ai scores below 15% on Claude output reliably.
Why does Claude score differently than GPT on AI detectors?
Claude is trained on different data than GPT and uses different architecture and training methods (constitutional AI). The result is slightly different output patterns. AI detectors were primarily trained on GPT outputs, so their statistical fingerprints for "AI text" lean toward GPT patterns. Claude output partially evades these fingerprints.
Should I switch from ChatGPT to Claude to avoid detection?
Switching from GPT to Claude gives you a 20 to 26 point detection score reduction on most detectors, which can move borderline cases below the flagging threshold. But Claude output is still detected most of the time. For reliable detector evasion, plan to humanize the output regardless of which LLM produced it.
Written by
WriteHumanly Team
The team behind WriteHumanly has spent thousands of hours studying how AI detectors actually score text, building tools used by students and professionals worldwide. We publish what we learn so other writers can make better decisions.
Ready to humanize your AI text?
Paste your content and get human-sounding output in seconds.
Try WriteHumanly Free