AssessAutomatedTranslation SJTPO
AssessAutomatedTranslation SJTPO
AssessAutomatedTranslation SJTPO
AUTOMATED
TRANSLATION SERVICES
Submitted by:
Rutgers, The State University of New Jersey
Alan M. Voorhees Transportation Center
Edward J. Bloustein School of Planning & Public Policy
33 Livingston Avenue
New Brunswick, NJ 08901
Principal Investigator: Sean Meehan
smeehan@ejb.rutgers.edu
March 2021
TABLE OF CONTENTS
Introduction 1
Translation Technology 1
Information Gathering 2
References 9
INTRODUCTION
Translation is necessary for the spread and sharing of information, knowledge, and ideas. Translation is
more than just words and grammatical structure. As noted by language teacher and researcher Valerie
Spreeman, though the individual wording may be similar, if one were to translate “my heart will go on” to
“my cardiovascular muscle will continue,” the original message of steadfastness will be lost. (Spreeman
2017) Above all else, true translation aims at communicating meaning from one language to another to
avoid situations like the one just described. Traditionally, human linguists have taken a large role in the
translation process to ensure that the translated text makes sense in context and is culturally appropriate.
With the introduction of consumer‐level automatic machine translation tools, it has become easy to
simply type a word or phrase into a text box and choose the language for translation. Machine translation
has become ubiquitous, instantly available, and accessible to anyone with an internet connection. But is
it time to remove humans from the translation process? Have automatic translation tools reached a high
enough level of accuracy to be relied upon to communicate meaning?
TRANSLATION TECHNOLOGY
Regardless of the language, in order to build a sentence effectively, a writer must employ semiotics, the
study of signs and their meanings. Semiotics is comprised of syntax, semantics, and pragmatics. Syntax is
the structure of how a sentence is built and the arrangement of the words. Semantics refers to the
meanings of words and phrases and how they are understood if there are multiple meanings for one word
or if the dictionary definition and the connotations of words or phrases can differ. Finally, Pragmatics
refers to the goal of the sentence and whether the writer's ideas can be clearly understood by the
intended audience. (Spreeman 2017) Since semiotics is essential for effective communication, it also
forms a foundation for developing translation technologies as developers seek to produce tools to enable
the effective and understandable translation of text.
Machine translation technology can generally be categorized into three main systems: Rule‐based,
Statistical, and Neural. Rule‐based machine translation software is built upon algorithms that analyze the
source language and use rules developed by human experts to translate structures from the source
language into the target language. (Allue 2016) Using a series of mathematical equations, rule‐based
systems review the structure of a sentence in one language and use a matching structure in another
language to rebuild the sentence when fully translated. With rule‐based systems, idioms, slang, and
other, more abstract aspects of language cannot usually be correctly translated. (Spreeman 2017)
Statistical machine translation systems also employ equations for translation; however, they apply the
math differently. Statistical machine translation systems create several possible translation hypotheses
and then statistically evaluate the correctness of each hypothesis until it can choose the one with the
1
highest probability of accuracy. (Spreeman 2017) An advantage of statistical machine translation systems
is that they have a better chance of accuracy; however, they are more challenging to create a model for
and require more data than a rule‐based system. Neural machine translation uses networks that consist
of many nodes that relate to each other and builds relationships based on bilingual texts used to train the
system. These systems, modeled after how the human brain processes languages, can to some degree
"learn" the languages it is translating and make connections and translations that it was not specifically
taught. (Spreeman 2017) Neural machine translation technology excels at deciphering contexts and
choosing the appropriate translation based upon the context; however, these technologies often have
issues with more complex sentences. Considering the advantages and drawbacks of technology based
upon the three main systems, "the best results with machine translation are found when two or more of
these systems are used together." (Spreeman 2017)
INFORMATION GATHERING
To start the information gathering process, the study team attempted to interview language and
translation experts, including a faculty member from the Department of Spanish and Portuguese at
Rutgers University, a representative from the Language Center at Rutgers University, and a representative
from a translation services company. Each expert consistently underscored the nuances and ambiguity
associated with language, emphasizing that every automated translation tool is far from perfect and
cannot replace expert human translation. Even after the study team agreed with this point, in all three
conversations, the experts declined to engage in a follow‐up discussion about the merits of individual
automated translation products. As a result, the research team decided to instead pursue a broader range
of scholarly and non‐scholarly texts.
While most of the authors of scholarly articles about assessing the reliability or accuracy of machine
translation technology expressed similar sentiments to the experts contacted during this study, they
acknowledged the limitations of automated translation technology while assessing these tools on their
merits. While the systems evaluated sometimes change, nearly every article written in the past ten years
regarding the evaluation of machine translation systems includes Google Translate. Google Translate is
well recognized and the most used translation service throughout the world. According to Google, as of
2016, over 500 million people were translating 100 billion words each day in 103 languages using Google
Translate. (Turovsky 2016) As of October 2020, the number of languages supported by Google Translate
had increased to 109. The initial version of Google Translate was launched in April 2006 and was centered
around statistical machine translation. Through this process, the text identified for translation is
translated first into English as an intermediary step language and then into the target language, cross‐
referencing the phrase in question with millions of documents taken from official United Nations and
European Parliament transcripts. (Sommerlad 2018) Building upon initial success with the statistical
model, in 2016 Google Translate was updated with a neural translation model. Using deep learning
methods, the service is able to translate and compare whole sentences at a time from a broader range of
linguistic sources. Deep learning is an aspect of artificial intelligence seeking to replicate human
learning, constricting an artificial neural network that can be trained through exposure to existing
examples. This update was aimed at achieving greater accuracy by giving the full context rather than just
sentence clauses in isolation. “By comparing Japanese‐to‐English translations with Korean‐to‐English,
the service is able to deduce and map out the relationship between Japanese and Korean and make
translations back and forth between those two languages accordingly, a great leap forward in computers'
2
understanding of semantics, a process still confounded by metaphorical expressions and quirky idioms.”
(Sommerlad 2018) By continually processing these calculations, Google Translate can spot recurring
patterns between words in different languages, continuously improving its chance for accuracy.
While Google is one of the most popular translation systems and one of the most noted in academic
studies, is it the best? Several studies of online translation systems have been conducted using humans to
review the results. Some of the more relevant findings are summarized below:
In a 2009 study, common phrases translated from German and Spanish into English with four
online translation services were evaluated. The findings showed that Google Translate was the
most accurate, followed by Systran and X10 in a tie for second, and then Applied Language.
Further study of Google Translate involved students choosing among multiple choices and
writing their own understanding of translations showed that even in cases where the grammar
becomes garbled, the meaning could often be ascertained. (Aiken, et al., 2009)
A 2010 study evaluated and ranked online machine translators, which are available on the Internet
and free of charge to the general public, for the quality of their target text translations from
English to Spanish and then from Spanish to English. Five types of targeted sentences were
evaluated: idiom, formal, lexical ambiguity, phrasal verb, and grammar. Though it was not top‐
ranked in every category, Google Translate had the highest overall score, followed by Babylon,
Reverso, Bing, Babelfish, Systran, PROMT, WorldLingo, InterTran, and Webtrance. (Hampshire
and Salvia 2010)
In a 2017 study, Google Translate, Bing, and Babylon were compared for translating Urdu to
Arabic sentences by using three performance evaluation metrics, BiLingual Evaluation
Understudy (BLEU), Metric for Evaluation of Translation with Explicit Ordering (METEOR), and
National Institute of Standard and Technology (NIST). Each metric uses different calculations
and algorithms to measure accuracy. As a rule, a machine translation that is closer to the
reference translation is considered to be more accurate; this is the central idea behind the
machine translation evaluation methods. The results show that Google translator, on average,
outperforms Bing and Babylon by 15.74% and 28.55% in BLEU technique, 13.74% and 3.28% in
METEOR technique, 20.83% and 3.91% in NIST technique, respectively. (Ayesha et al. 2017)
Another 2017 study analyzed Anusaaraka, Bing, Babelfish, Mantra, and Google Translate based on
the translation of English texts into Hindi. The study concluded that of the translation systems
studied, Google Translate has the highest accuracy percentage with an overall efficiency score of
82.66%. Bing and Babelfish have around the same accuracy (~77%), and Anusaraaka and Mantra
were found to have the least accuracy, making them less reliable for translation. (Kharb et al.
2017)
The most recent study on the subject was published in July 2020. This study provided a
comprehensive evaluation of Google Translate, Bing Translator, Systran, PROMT, Babylon,
WorldLingo, Yandex, and Reverso. The study evaluated each system using Chinese, English,
Hindi, Spanish, Arabic, Malay, and Russian in all combinations except Chinese as the target.
Results showed that Google Translate was more accurate overall as compared to the other seven
options in the study. The study noted that Google Translate is more accurate when the source
language and target language are similar languages or dialects. A translation from English to
Spanish will generate better quality translation than translation from German to Hindi. It was
3
also noted that Google Translate supports far more languages than competitors do. (Vanjani and
Aiken 2020)
In addition to academic studies, evaluations of automated translation systems by users were also sought.
In various blog articles and technology review sites, the user experience for a variety of systems was rated.
Many of these resources were related to companies operating individual translation systems or systems
that build off another's technology, such as Google. None of these resources have been included due to
the potential for bias. Following a review of available resources, the most relevant findings are
summarized below:
While not regularly mentioned in scholarly texts, DeepL Translator received more mention in user‐based
rankings and discussions of automated translation services. Launched in August 2017, DeepL is a free,
neural machine translation service founded by a former Google employee. DeepL stands for deep
learning, the same area of artificial intelligence aimed at replicating human learning, which was at the
core of the Google Translate update in 2016. The team behind DeepL feels that what distinguishes their
system from Google is their very high‐quality training material and running regular blind tests to make
sure the program keeps its high standards. (Smolentceva 2018) DeepL claims to be "the world’s best
translation machine," backing this claim with the results of their own August 2017 blind test published on
their website comparing DeepL's results with those of their rivals: Google and Microsoft (Smolentceva
2018). In this test, the rival systems were given 100 sentences to translate from English into German,
French and Spanish and from German, French, and Spanish into English. Following machine translation,
professional translators reviewed the produced text and rated the quality of each translation. DeepL was
chosen three times more often than Google because its translation sounded more natural (Smolentceva
2018). Unfortunately, there is not enough information about how these evaluations have been conducted
that would allow an impartial party to substantiate them at this time.
While no peer‐reviewed journal articles referencing scientific studies could be found declaring DeepL
superior to Google Translate, some individuals running their own less‐scientific experiments have
declared DeepL to be superior in certain situations:
4
In a blog article on the website for the University of Strasbourg's Online Master's Degree in
Technical Communication and Localization, Alexandra Deparvu discusses the results of her own
informal experiment to test the capabilities of DeepL using a non‐fictional text from the European
Personnel Selection Office (EPSO) Comparing the original French source text and the two
machine‐translated target texts, Deparvu experienced better results from DeepL, noting that
Google Translate seemed to go for the more literal translation, while DeepL tried to include
synonyms in order to not lose certain nuances, and this difference eventually resulted in a more
natural translation. Deparvu admits that this was by no means a comprehensive experiment and
no definite claims can be made as to the relative proficiency of both systems; however, within the
scope of her experiment DeepL outperformed Google Translate (Deparvu 2018).
The Globalization and Localization Association (GALA) is a global, non‐profit trade association
for the language industry. In an undated white paper attributed to GALA, three authors claim to
confirm that DeepL is better than Google Translate. The study employs quantitative analysis
regarding the percentage of correct translations per defined linguistic category. Although the
percentages are nearly identical, DeepL performs better than Google in all categories except one.
The authors note that Google performs particularly well, namely with above 90% of correct
translations, on one category: nonverbal agreement. DeepL performs particularly well in three
categories: verb valency, non‐verbal agreement, and composition. Verb valency is the category
with the most significant gap between the two systems' performance as DeepL achieves 34.1
percentage points more than Google. Their overall conclusion is that “in a comparison of the
Google Translate and DeepL systems for German‐English where we could confirm the observation
that DeepL performs a little better than Google on our test set" (Macketanz et al.).
Text United, a cloud‐based translation platform based out of Vienna, Austria, has a ranking of free
machine translation engines on their website. While they tout the accuracy of DeepL, the lack of
scope and scale of the DeepL platform is also noted. Google Translate is ranked number one due
to its usability and the fact that it operates with more languages and language combinations than
any other alternative. TUFT, their own Machine Translation service, is based on Google Translate
and integrates with their other services and technology (Plotnik 2020).
While DeepL is an innovative service that has its admirers, there is not enough available scientifically
based data to prove that it is more accurate than Google Translate. Where Google Translate truly
outshines DeepL is its ability to translate over 100 languages compared to DeepL's 11 languages. Google
Translate is often preferred due to its simplicity, speed, and features, such as its ability to allow users to
import documents for translation directly. While DeepL is a powerful tool, Google seems to offer the
most comprehensive, flexible, and easy to use machine translation service.
5
As a result, even though Google Translate is a powerful tool, a number of mistakes can still be found,
especially for words that have multiple meanings and functions (Vidhayassi et al. 2015). It has also been
demonstrated in numerous studies that although Google Translate provides translations among a large
number of languages, the accuracies of these translations vary greatly. Translations between European
languages have generally been found to have the best results, while those involving Asian languages are
more likely to be poor. The best results occur when the source language and target language are similar
languages or dialects (Aiken & Balan 2011). In one study, sentences were translated into French, German,
Japanese, and Spanish. Native speakers of the languages volunteered to evaluate the translated output
using two sets of scales: intelligibility and accuracy. It was found that the results for accuracy and
intelligibility were similar, with the German output receiving the worst evaluations for both metrics. The
Japanese output for both metrics received the second‐worst evaluations. The Spanish output had the
highest evaluation for intelligibility and received the joint highest evaluation, along with French, for
accuracy. Overall it was found that the majority of the French and Spanish output was of reasonably high
quality but still required some post‐editing. The German and Japanese output was of lower quality and
needed more substantial correcting before publication (Tobin 2015).
While still noting the prevalence of errors, studies dating back as far as 2011 have also noted that the vast
majority of language combinations offered via Google Translate seem to provide sufficient accuracy for
reading comprehension (Aiken & Balan 2011). As noted in the Tobin study's conclusion in 2015, in the
majority of the sentences chosen, it would seem that the Spanish and French output is understandable
but not perfect (Tobin 2015). To determine whether the output is sufficient, the translated text's
potential function needs to be taken into account. Output of such quality may be useful for general
information purposes and can be used to generate new thoughts or other points of view. Machine
translation is fast, cost‐effective, and relatively accurate, and therefore excels when used in informal
settings where accuracy is not as important so long as one gets the gist of the information (Spreeman
2017). However, manual translation by human translators works better when deeper and more extensive
knowledge on the subject of translation is required, especially when translating text with specific
contents, terms, conditions, or legal information (Vidhayasai et al. 2015). In a 2013 study to determine
whether translation software could be used to help Legal Services Agencies deliver legal information, the
authors concluded that the technology was not yet accurate enough to reliably communicate legal
information; a very sensitive topic. While the tools did not yet meet the assessed need, the authors were
confident that machine translation software tools will continue to improve and bring the quality of pure
machine translation closer to the quality of human‐dependent tools, but ultimately predicting that it will
be at least a decade before a human can perform a merely trivial review of automated translations (Hogue
& Hineline 2013). In a recent study involving undergraduate students using Google Translate, one of the
data collection methods involved a questionnaire to gather data about the accuracy in content,
acceptability, and readability. The target readers’ responses regarding the translated text were positive.
Ratings were measured along a scale of one to three, with one signaling inaccuracy, unacceptability, and
unreadability, while three indicates accuracy, acceptability, and high readability level. Data analysis
results show that: the average score for content accuracy is 1.97, the average score for acceptability is 1.93,
and the average score for readability is 2.07 (Simanjuntak 2019). Based upon this analysis, the author
concluded that the level of message accuracy, acceptability, and readability of the translated abstract text
by Google Translate obtained an average score of 1.97 or closer to the score of 2, which means that the
quality of this translation is close to good quality even though the translated text message is rendered less
accurate into the target language (Simanjuntak 2019). Like Hogue and Hineline, Simanjuntak is looking
to the future, noting that with continued improvements and program updates, it is expected that Google
Translate will become one of the most reliable translation machines in the future (Simanjuntak 2019).
While Google Translate is one of the most popular and accurate translation machines available, it still has
6
weaknesses in adjusting inaccurate words that may cause misunderstanding by the user without
significant human intervention and correction.
Accurate translation of critical documents is the cornerstone of effective outreach and meaningful
involvement. Traditionally, human linguists have taken a large role in the translation process to ensure
that the translated text makes sense in context. While an excellent choice for ensuring accuracy,
understanding, and cultural appropriateness, expert human translation comes with financial costs and
time constraints that can make continual use of these services unsustainable. With the introduction of
automatic translation tools, it has become possible to remove that costly human element from the
equation by simply typing a word or phrase into a text, choosing the language for translation, and
instantly receiving results. While automatic translation tools have become inexpensive and easy to use,
there are two important questions; are they good enough, and can they be used effectively?
This assessment has sought to shed some light regarding both questions. Within the last decade,
advances in automated translation technology have been astounding. Following the review of both
scholarly and more user‐based tests and reviews of available technology, this assessment has concluded
that Google seems to offer the most comprehensive, flexible, and easy to use machine translation service
at this time. While Google has come a long way, the many nuances and ambiguity of languages restrict
their translation tool to be less than perfect. Google Translate performs exceptionally well and
approaches its highest quality translating between English and Spanish. Unfortunately, Google Translate
is weakest in translation between English and Asian languages, some of the more difficult and thus costly
languages to translate. While Google does best translating between English and Spanish, at this time
there is still too much room for error when accurate understanding is imperative. Based on these
findings, at this time, it would be inappropriate to recommend the use of Google Translate for the
translation of vital SJTPO documents used to communicate specific, targeted information.
Google Translate could be most useful to SJTPO in less formal situations when relied upon not for
specific translation of vital information but for facilitating conversation via the public participation and
engagement process. Google Translate could be used to translate brief general information documents,
meeting fliers or support materials, and translating public comments or interacting with the public at live
events. As SJTPO continues to grow and expand its partner network, Google Translate could also be used
more widely if verified by a native speaker on staff or affiliated with a partner organization serving
targeted outreach communities. While providing a complete translation would be a time‐consuming and
7
challenging task, a partner may be more willing to review and make minor corrections to a smaller
document that has already been translated via Google Translate to help ensure appropriateness. If SJTPO
chooses to use Google Translate, it should be made clear via the website or in the document itself that
Machine Translation was used and is for information purposes only.
Professional translation services remain the best choice for accuracy, understanding, and cultural
appropriateness, especially when translating to languages other than Spanish. To limit the financial costs
and time constraints inherent in the professional translation process, it is recommended that SJTPO
develop a relationship with and make strategic use of a full‐service translation services agency. SJTPO
should seek to develop a list of frequently used information and phrases that can be professionally
translated. Information including SJTPO descriptive language and facts, program descriptions, and basic
outreach information can be used in multiple documents, formats, and settings moving forward. Relying
on accurate key information made available through professional translation and using Google Translate
to fill‐in the missing parts would allow SJTPO to develop new documents or update existing documents
with a higher level of confidence that the message will be received effectively. Similarly, minor updates
to professionally translated vital documents could also be made quickly using Google Translate with a
higher level of confidence that the document's overall meaning has not been changed. If used
strategically, a hybrid approach of relying on professionally translated source materials while making
adjustments and updates via Google Translate could help SJTPO expand communication throughout the
region while limiting costs.
8
REFERENCES:
Aiken, M., Ghosh, K., Wee, J., & Vanjani, M. (2009). An evaluation of the accuracy of online translation
systems. Communications of the IIMA, 9(4), 6.
Aiken, M., & Balan, S. (2011). An analysis of Google Translate accuracy. Translation Journal, 16(2), 1‐3.
https://translationjournal.net/journal/56google.htm
Aiken, M., & Wong, Z. (2019). An updated evaluation of Google Translate accuracy. Studies in linguistics
and literature, 3(3), 253‐260.
Allué, B. R. (2016). The reliability and limitations of google translate: a bilingual, bidirectional and genre‐
based evaluation. Entreculturas: revista de traducción y comunicación intercultural, (9), 67‐80.
Ayesha, M. A., Noor, S., Ramzan, M., Khan, H. U., & Shoaib, M. (2017). Evaluating Urdu to Arabic
Machine Translation Tools. International Journal of Advanced Computer Science and Applications, 8(10),
90‐96.
Deparvu, A. (2018, February 19) Google Translate vs. DeepL: Worthy Competitors? Master TCLoc.
https://mastertcloc.unistra.fr/2018/02/19/google‐translate‐vs‐deepl/
ElShiekh, A. A. A. (2012). Google translate service: transfer of meaning, distortion, or simply a new
creation? An investigation into the translation process & problems at google. English Language and
Literature Studies, 2(1), 56.
Farah Hana, A. (2017). Errors made by google translate and its rectification by human translators/Farah
Hana Amanah (Doctoral dissertation, University of Malaya).
Hampshire, S., & Salvia, C. P. (2010). Translation and the Internet: evaluating the quality of free online
machine translators. Quaderns: revista de traducció, 197‐209.
Henry, A. (2014, September 14) Five Best Language Translation tools. Lifehacker.
https://lifehacker.com/five‐best‐language‐translation‐tools‐1634228212
Hogue, J., & Hineline, A. (2013). Can translation software help legal services agencies deliver legal
information more effectively in foreign languages and plain English? An evaluation of existing translation
technology and recommendations for future use. Legal Assistance of Western New York, Inc.
https://sites.google.com/a/lawny.org/plain‐language‐library/home/translation‐technology‐report
Kharb, S., Kumar, H., Kumar, M., & Chaturvedi, A. K. (2017, April). Efficiency of a machine translation
system. In 2017 International conference of Electronics, Communication and Aerospace Technology
(ICECA) (Vol. 1, pp. 140‐148). IEEE.
9
Macketanz, V., Burchardt, A, Uszkoreit, H. (n.d) TQ‐Autotest: Novel Analytical Quality Measure
Confirms that DeepLis better than Google Translate. Globalization & Localization Association.
https://www.dfki.de/fileadmin/user_upload/import/10174_TQ‐
AutoTest_Novel_analytical_quality_measure_confirms_that_DeepL_is_better_than_Google_Translate.pd
f
Mundt, K., & Groves, M. (2016). A double‐edged sword: the merits and the policy implications of Google
Translate in higher education. European Journal of Higher Education, 6(4), 387‐401.
Plotnik, E. (2020, October) Free Machine Translation Engines, the best so far. Text United.
https://www.textunited.com/blog/best‐free‐machine‐translation‐engines/
Simanjuntak, F. (2019). A Study on Quality Assessment of the Translation of an Abstract Text English
Idioms Errors Made by Jordanian EFL Undergraduate Students by Google Translate. Online Submission,
2(4), 38‐49.
Smolentceva, N. (2018, May 12) Deep L: Cologne‐based startup outperforms Google Translate. DW.
https://p.dw.com/p/39S5k
Sommerlad, J. (2018, June 19). Google Translate: How Does the Search Giant's Multilingual Interpreter
Actually Work? Independent. https://www.independent.co.uk/life‐style/gadgets‐and‐tech/news/google‐
translate‐how‐work‐foreign‐languages‐interpreter‐app‐search‐engine‐a8406131.html
Spreeman, V. (2017). Lost (and Found) in Translation: a Look at the Impact of Google Translate and
Other Translation Technologies.
Tobin, A. (2015). Is Google Translate Good Enough for Commercial Websites?; A Machine Translation
evaluation of text from English websites into four different languages. Reitaku Review, 21, 94‐116.
Turovsky, B. (2016, April 28). Ten years of Google Translate. The Key Word.
https://blog.google/products/translate/ten‐years‐of‐google‐translate/
Vanjani, M & Aiken, M. (2020). A Comparison of Free Online Machine Language Translators. Journal of
Management Science and Business Intelligence, 5‐1, 26‐31.
Vidhayasai, T., Keyuravong, S., & Bunsom, T. (2015). Investigating the Use of Google Translate in" Terms
and Conditions" in an Airline's Official Website: Errors and Implications. PASAA: Journal of Language
Teaching and Learning in Thailand, 49, 137‐169.
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., et al. (2016). Google's neural
machine translation system: Bridging the gap between human and machine translation. arXiv preprint
arXiv:1609.08144.
10