If a human writes text then translates it using an AI, will this be flagged as "AI content" by AI detection tools? This is question that matters a lot to me, since translateabook.com's goal is to produce high quality AI book translation that authors can actually use to publish in another language with minimum extra work. Triggering an AI-detector would be a signal that the AI translation process is smoothing over the author's voice, making it less publishable.
To put it to the test, I analyzed 500 random samples from 100 real-world book translations with Pangram.com, a currently quite popular AI detection tool.
The result, out of 500 samples:
The short answer seems to be: No, AI translation doesn't trigger AI detectors - a encouraging sign for authors that AI translation can be a strong basis for publishing to new markets. Read on if you want more details on this experiment.
I used translations coming from translateabook.com Standard Mode, which is the simplest translation mode and closest to a simple (big) prompt asking the AI to translate the text, with less input from the agent harness. Models used depended on the language pair, but were mostly Gemini Flash 3.1, DeepSeek v3 and ChatGPT 5.2.
I used samples ranging from 500 to 1000 words (or up to 2000 characters for languages with no spaces between words, like Chinese or Japanese). They were coming from documents in a variety of genres, including fiction and non-fiction, and a number of different languages.
Pangram displayed version is 3.3.2.
Because Pangram is somewhat expensive I ran only of few tests, where a number of variables (languages, genres, AI models...) are confounding each other and I would consider the results indicative only. However the results are so strongly in favor of "human written" that it does seem indicative that Pangram generally doesn't consider AI translations to be AI-written content.
Pangram gives a "Percent AI" score (like 4.8%) and a label (like "Human written"). I had a hard time finding an exact mapping between the two since I saw a few "Human written" with a higher AI percent score than some "Mostly human, AI assisted" labels, but it seems like roughly up to 20% is "Human written", 20-30% is "Mostly human, AI assisted" or "AI assisted" and above 30% is "AI detected".
As a comparison, if I enter the output of a chat session with claude or chatGPT into Pangram I get a score of 100% AI.
Here are the results per language:
| Target language | Passages | Scored | Mean AI | Median AI |
|---|---|---|---|---|
| Dutch | 115 | 115 | 1.4% | 1.1% |
| German | 73 | 73 | 0.8% | 0.5% |
| Polish | 67 | 67 | 1.5% | 1.3% |
| English (US) | 66 | 66 | 10.3% | 9.2% |
| French (France) | 25 | 25 | 1.3% | 0.5% |
| Ukrainian | 25 | 25 | 2.7% | 0.9% |
| Bulgarian | 21 | 21 | 0.6% | 0.5% |
| Spanish (Latin America) | 15 | 15 | 0.5% | 0.4% |
| Czech | 15 | 15 | 1.2% | 0.7% |
| English (UK) | 10 | 10 | 0.8% | 0.6% |
| Swedish | 10 | 10 | 2.5% | 1.1% |
| Russian | 10 | 10 | 9.0% | 5.2% |
| Finnish | 6 | 6 | 4.4% | 4.2% |
| Portuguese (Brazil) | 5 | 5 | 0.9% | 0.3% |
| Vietnamese | 5 | 5 | 2.4% | 1.3% |
| Turkish | 5 | 5 | 0.6% | 0.3% |
| Greek | 5 | 5 | 7.1% | 0.5% |
| Uzbek | 5 | 5 | 0.4% | 0.4% |
| Hebrew | 5 | 5 | 0.6% | 0.5% |
| Italian | 5 | 5 | 0.4% | 0.4% |
| Spanish (Spain) | 3 | 3 | 0.3% | 0.3% |
| Romanian | 3 | 3 | 0.9% | 1.0% |
| Chinese (Simplified) | 1 | 1 | 2.7% | 2.7% |
All the "Percent AI" scores are really low - US English is higher, maybe Pangram is more sensitive there (though not in UK English?).
Pangram says they currently support the following languages: English, Arabic, Chinese, Czech, Dutch, French, German, Greek, Hindi, Hungarian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, Ukrainian, Urdu, and Vietnamese.
So actually Bulgarian, Finnish, Uzbek and Hebrew where in the sample are not officially supported by Pangram. This represents 37 samples - checking my results, all of them were scored as "Human written".
In case you're curious, "AI detected" were a Greek and a Russian samples, "Mostly human, AI assisted" were a Russian and a Ukrainian samples, and "AI assisted" was a US English (US) sample.
Here are the results per genre:
| Book type | Passages | Scored | Mean AI | Median AI |
|---|---|---|---|---|
| Novel | 342 | 342 | 2.3% | 0.9% |
| Novella | 28 | 28 | 2.6% | 0.7% |
| Academic | 25 | 25 | 3.0% | 1.5% |
| Memoir | 23 | 23 | 2.1% | 0.8% |
| Spiritual | 20 | 20 | 2.2% | 1.5% |
| Short Stories | 15 | 15 | 12.7% | 15.8% |
| Science | 11 | 11 | 7.2% | 7.2% |
| Biography/Autobiography | 10 | 10 | 0.7% | 0.3% |
| Self-Help | 10 | 10 | 2.7% | 1.1% |
| Textbook | 5 | 5 | 0.9% | 0.3% |
| Children | 5 | 5 | 0.7% | 0.6% |
| Manual | 5 | 5 | 0.4% | 0.4% |
| Health | 1 | 1 | 3.6% | 3.6% |
All still very low. "Short Stories" was the highest, and when I check this was mostly English (US). The language and genre variables are confounding each other so we can't draw too strong a conclusion, but it seems like the AI score stays quite low across genres. Interesting, given that Academic or Scientific writing are easier to translate than Novel and Novellas.
This experiment is points in an encouraging direction but is more of a quick test than an in-depth study. Here are some limitations.
Regarding AI models: I only tested a few AI models, a more thorough study would include more models and more data per model. I did try manually translating human text through Google Translate to see if the same result would be reproduced, and pangram flagged it as 100% human written - slightly indicative that this might a be pretty general result.
Regarding AI-detectors: I only tested with Pangram, because they had strong privacy protections and I was often seeing them quoted online by tech people. To go further we should test the results with other AI detection tools - this can be costly (I did blow through $20 in a single command calling in Pangram, before I realized the sampling wasn't randomized correctly...) I manually used GPTZero on something I translated as well, and it returned a "100% human" score.
Most importantly, AI detectors are one signal among others for detecting AI-written content. It's still possible that the AI made mistakes that a human would not have made - misunderstanding context somewhere, or translating a joke in a way that doesn't make sense - that Pangram couldn't detect.
For this reason my official recommendation is having a human check out the translation result when using an AI translation tool, when translating for publication. translateabook.com mitigates these errors in higher translation modes (the AI does more self-review, has more context, etc.) and provides AI-proofreading tools to get a sense of the translation strengths and weaknesses, allowing the proofreading to be done very quickly by a human native speaker.
Given these limitations, what truly gives me confidence in the potential of AI translation is the feedback of the thousands of authors who translated and published books using Translate a Book.
This experiment is one data point that points towards AI translation not having an easy to detect AI signature. It seems that using recent AI models on human-written text passes AI detector tests.
Doing a quick proofread of the result should be enough to remove potential AI stray artifacts and have a publishable result. This matches the experience authors report having on translateabook.com.
If you have any further thoughts or comments about this experiment, feel free to reach out through our contact page - I'd love to hear them.