Translate a Book with AI
Translate a Book
Toggle sidebar

Translating With AI Does Not Trigger AI Detectors

I ran 500 samples from 100 real book translations through Pangram. 495 scored as human written.
Translating With AI Does Not Trigger AI Detectors

If a human writes text then translates it using an AI, will this be flagged as "AI content" by AI detection tools? This is question that matters a lot to me, since translateabook.com's goal is to produce high quality AI book translation that authors can actually use to publish in another language with minimum extra work. Triggering an AI-detector would be a signal that the AI translation process is smoothing over the author's voice, making it less publishable.

To put it to the test, I analyzed 500 random samples from 100 real-world book translations with Pangram.com, a currently quite popular AI detection tool.

The result, out of 500 samples:

  • 2 were flagged as "mostly human, AI assisted"
  • 1 was flagged as "AI assisted"
  • 2 were flagged as "AI detected" (with a low AI score)
  • 495 were marked as "Human written"

The short answer seems to be: No, AI translation doesn't trigger AI detectors - a encouraging sign for authors that AI translation can be a strong basis for publishing to new markets. Read on if you want more details on this experiment.

What Was Tested

I used translations coming from translateabook.com Standard Mode, which is the simplest translation mode and closest to a simple (big) prompt asking the AI to translate the text, with less input from the agent harness. Models used depended on the language pair, but were mostly Gemini Flash 3.1, DeepSeek v3 and ChatGPT 5.2.

I used samples ranging from 500 to 1000 words (or up to 2000 characters for languages with no spaces between words, like Chinese or Japanese). They were coming from documents in a variety of genres, including fiction and non-fiction, and a number of different languages.

Pangram displayed version is 3.3.2.

Because Pangram is somewhat expensive I ran only of few tests, where a number of variables (languages, genres, AI models...) are confounding each other and I would consider the results indicative only. However the results are so strongly in favor of "human written" that it does seem indicative that Pangram generally doesn't consider AI translations to be AI-written content.

The Results

Pangram gives a "Percent AI" score (like 4.8%) and a label (like "Human written"). I had a hard time finding an exact mapping between the two since I saw a few "Human written" with a higher AI percent score than some "Mostly human, AI assisted" labels, but it seems like roughly up to 20% is "Human written", 20-30% is "Mostly human, AI assisted" or "AI assisted" and above 30% is "AI detected".

As a comparison, if I enter the output of a chat session with claude or chatGPT into Pangram I get a score of 100% AI.

Here are the results per language:

Target language Passages Scored Mean AI Median AI
Dutch 115 115 1.4% 1.1%
German 73 73 0.8% 0.5%
Polish 67 67 1.5% 1.3%
English (US) 66 66 10.3% 9.2%
French (France) 25 25 1.3% 0.5%
Ukrainian 25 25 2.7% 0.9%
Bulgarian 21 21 0.6% 0.5%
Spanish (Latin America) 15 15 0.5% 0.4%
Czech 15 15 1.2% 0.7%
English (UK) 10 10 0.8% 0.6%
Swedish 10 10 2.5% 1.1%
Russian 10 10 9.0% 5.2%
Finnish 6 6 4.4% 4.2%
Portuguese (Brazil) 5 5 0.9% 0.3%
Vietnamese 5 5 2.4% 1.3%
Turkish 5 5 0.6% 0.3%
Greek 5 5 7.1% 0.5%
Uzbek 5 5 0.4% 0.4%
Hebrew 5 5 0.6% 0.5%
Italian 5 5 0.4% 0.4%
Spanish (Spain) 3 3 0.3% 0.3%
Romanian 3 3 0.9% 1.0%
Chinese (Simplified) 1 1 2.7% 2.7%

All the "Percent AI" scores are really low - US English is higher, maybe Pangram is more sensitive there (though not in UK English?).

Pangram says they currently support the following languages: English, Arabic, Chinese, Czech, Dutch, French, German, Greek, Hindi, Hungarian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, Ukrainian, Urdu, and Vietnamese.

So actually Bulgarian, Finnish, Uzbek and Hebrew where in the sample are not officially supported by Pangram. This represents 37 samples - checking my results, all of them were scored as "Human written".

In case you're curious, "AI detected" were a Greek and a Russian samples, "Mostly human, AI assisted" were a Russian and a Ukrainian samples, and "AI assisted" was a US English (US) sample.

Here are the results per genre:

Book type Passages Scored Mean AI Median AI
Novel 342 342 2.3% 0.9%
Novella 28 28 2.6% 0.7%
Academic 25 25 3.0% 1.5%
Memoir 23 23 2.1% 0.8%
Spiritual 20 20 2.2% 1.5%
Short Stories 15 15 12.7% 15.8%
Science 11 11 7.2% 7.2%
Biography/Autobiography 10 10 0.7% 0.3%
Self-Help 10 10 2.7% 1.1%
Textbook 5 5 0.9% 0.3%
Children 5 5 0.7% 0.6%
Manual 5 5 0.4% 0.4%
Health 1 1 3.6% 3.6%

All still very low. "Short Stories" was the highest, and when I check this was mostly English (US). The language and genre variables are confounding each other so we can't draw too strong a conclusion, but it seems like the AI score stays quite low across genres. Interesting, given that Academic or Scientific writing are easier to translate than Novel and Novellas.

Limitations

This experiment is points in an encouraging direction but is more of a quick test than an in-depth study. Here are some limitations.

Regarding AI models: I only tested a few AI models, a more thorough study would include more models and more data per model. I did try manually translating human text through Google Translate to see if the same result would be reproduced, and pangram flagged it as 100% human written - slightly indicative that this might a be pretty general result.

Regarding AI-detectors: I only tested with Pangram, because they had strong privacy protections and I was often seeing them quoted online by tech people. To go further we should test the results with other AI detection tools - this can be costly (I did blow through $20 in a single command calling in Pangram, before I realized the sampling wasn't randomized correctly...) I manually used GPTZero on something I translated as well, and it returned a "100% human" score.

Most importantly, AI detectors are one signal among others for detecting AI-written content. It's still possible that the AI made mistakes that a human would not have made - misunderstanding context somewhere, or translating a joke in a way that doesn't make sense - that Pangram couldn't detect.

For this reason my official recommendation is having a human check out the translation result when using an AI translation tool, when translating for publication. translateabook.com mitigates these errors in higher translation modes (the AI does more self-review, has more context, etc.) and provides AI-proofreading tools to get a sense of the translation strengths and weaknesses, allowing the proofreading to be done very quickly by a human native speaker.

Given these limitations, what truly gives me confidence in the potential of AI translation is the feedback of the thousands of authors who translated and published books using Translate a Book.

Conclusion

This experiment is one data point that points towards AI translation not having an easy to detect AI signature. It seems that using recent AI models on human-written text passes AI detector tests.

Doing a quick proofread of the result should be enough to remove potential AI stray artifacts and have a publishable result. This matches the experience authors report having on translateabook.com.

If you have any further thoughts or comments about this experiment, feel free to reach out through our contact page - I'd love to hear them.