Translation and Technology PDF
Translation and Technology PDF
Translation and Technology PDF
Series Editors: Gunilla Anderman and Margaret Rogers, The Centre for Transla-
tion Studies, University of Surrey, UK
Palgrave Textbooks in Translating and Interpreting bring together the most
important strands of thinking in a fast-developing field. Volumes in the series
Titles include:
C.K. Quah
TRANSLATION AND TECHNOLOGY
Ann Corsells
PUBLIC SERVICE INTERPRETING AND TRANSLATING
Acknowledgements xiv
Introduction 1
1 Definition of Terms 6
Machine translation 8
Human-aided machine translation 11
Machine-aided human translation 13
Human translation 14
The localization industry 19
Conclusion 20
vii
Appendices 197
References 204
Index 218
ix
Tables
Boxes
xii
xiv
To Caroline, Elizabeth, Gillian and Lyndsay, thank you for helping out
with keying in corrections on the earlier drafts. Lastly, to my ‘sifu’ and
friend Peter Newmark, a big thank-you for all the translation discussions
we had during our coffee–biscuit sessions years ago.
If it had not been for the series editors, Gunilla Anderman and
Margaret Rogers, this book would not have been written. I am forever
xvi
information about such tools is harder to obtain. This chapter will also
show that computer-aided translation tools are becoming more advanced
and using different operating systems, and so ‘standards for data inter-
change’ have been created. Three different standards are described.
Currently available commercial translation tools are also discussed. In
addition, this chapter presents an overview of other commercially avail-
Distinctions between some of these terms are not always clear. For
example, computer-aided translation (CAT) is often the term used in
Translation Studies (TS) and the localization industry (see the second
part of this chapter), while the software community which develops
this type of tool prefers to call it ‘machine-aided translation’ (MAT). As
the more familiar term among professional translators and in the field
of Translation Studies, ‘computer-aided translation’ is used throughout
the book to represent both computer-aided translation and machine-aided
translation tools, and the term ‘aided’ is chosen instead of ‘assisted’, as
also in ‘human-aided machine translation’ and ‘machine-aided human
translation’.
Figure 1.1 distinguishes four types of translation relating human and
machine involvement in a classification along a linear continuum
introduced by Hutchins and Somers (1992: 148). This classification,
now more than a decade old, will become harder to sustain as more
tools become multifunctional, as we shall see in Chapters 3, 4 and 6.
Nevertheless, the concept in Figure 1.1 remains useful as a point of
reference for classifying translation in relation to technology.
MT CAT
Machine Human
MT HAMT CAT HT
Topic A
Topic B
Machine translation
SL text TL text
Machine
translation
system
Human intervention
three levels: basic, standard or advanced, each level having its own
detailed technical definition given by the IAMT based on the size of the
dictionaries and the syntactic analysis used.
A basic-level system typically has the following characteristics. It
Pre-edited SL
text [H]
Pre-editing [H]
Machine and human
interaction [H]
systems are MaTra Pro and Lite developed at the National Centre for
Software Technology based in Mumbai, India, that translate from
English into Hindi. Human-aided machine translation systems have
been implemented at Schreiber Translations, Inc., Foreign Language
Services, Inc. and Ralph McElroy Translation Company, all companies
that are employed to translate patents for the United States (US) Patent
post-editing [H]
Human translation
words through the Google search engine that comes with experience.
Sometimes some ingenuity is required. Just typing in the Japanese
word and the word ‘English’ into the search field [where a query
precluding extensive use of the Web. For some clients who are Micro-
soft Office savvy, I use tags [hidden comments or remarks which are
inserted in the translated text by the translator] when I am unsure of
Conclusion
Until the early 1990s, the time when the Internet began to be used
worldwide, the translation types given in Hutchins and Somers (1992)
were certainly applicable. More than a decade later, the boundaries of
these four translation types have become more blurred. Although many
writers in the field still make clear distinctions, these have become
harder to maintain as technology becomes increasingly multifunctional
and more multitasking. The pace of change in the development of
translation technology is extremely rapid; what is current today may
Translation theory
22
The next few sections will discuss the often uneasy relationships
between four professional and academic groups: linguists, professional
translators, translation theorists and scientists, and how each group
approaches the phenomenon of translation.
reduces his ability to explain to his clients the reason why he translates
in a particular way (see also Ulrych 2002). Noguiera (2002), also a
professional translator, captures this sentiment when he states that
there are many bright and brilliant translators who could but do not
contribute to translation theory. Since they are practising translators
they can ill-afford to spend the time or effort working on theory. Thus
SL text TL text
For the scientist, the main issue is not whether linguistics is prescrip-
tive or descriptive; a more important criterion is that the particular
approach applied must be computationally tractable (Bennett 2003: 144).
This means that to be useful to the building of a machine translation
system, the computer program implementing the linguistic approach must
run at a practical or acceptable speed on a standard computer. Linguists,
Until the late 1960s, the method used to generate translations in nearly
all machine translation systems was the ‘direct translation’ approach
(see Chapter 3). This approach is based on the assumption that one
target-language word can be generated from one source-language word.
It also requires a minimal syntactic analysis, for example, recognition of
word classes such as noun and verb (Hutchins 1979: 29; see also
Chapter 3). One of the original systems built was the Georgetown
University System. The poor quality of the translations produced by the
system highlighted the complexities of language and the need for a better
analysis and synthesis of texts (Hutchins 1979: 31; see Chapter 3). Hence,
in the subsequent system known as Systran (System Translation), linguistic
and computational components were divided into separate modules in
order to resolve the problems encountered in the Georgetown Univer-
sity System. Even with Systran, the underlying linguistic component
(see also Bennett 2003: 146) where what can be called the ‘complete
sentence’ (CS) is stripped to show the syntactic structure that describes
its actions and properties; one way of displaying a sentence representa-
tion (SR) of the sentence ‘The mechanic repaired the car’, is as shown in
Figure 2.3.
However, according to Hutchins, transformational grammar was
S → NP VP
NP → N
VP → V NP
NP → N
The NP (noun phrase) and its ‘lower hierarchy’ (noun) represent ‘Jane’,
while the VP (verb phrase) represents ‘kicks David’, which has a lower
hierarchy of V (verb), that represents ‘kicks’, and the NP and its lower
hierarchy of N that represents ‘David’. The f-structure (feature structure),
on the other hand, is used to represent the internal structure of a
sentence; its properties can be encoded with sets of attributes and
values in a matrix-like diagram: [attribute value], where ‘attribute’ refers
to the grammatical category such as gender or a syntactic function, and
NUM sgl
PERS 3
Present tense
NUM sgl
PERS 3
Translation studies
Translation Studies
Pure Applied
Principles, theories, & models building Lexicographical & terminological aids Grammars
Medium-restricted
and can be restricted in more than one way. An example of this would
be the analysis of novels and short stories written by Gabriel García
Márquez, which is restricted to language and culture (Colombian
Spanish into English), genre (novels and short stories) and time (1960s
to the 1990s) (Munday 2001: 12, 192–5). While the ultimate goal of
Translation Studies is to build ‘a full inclusive theory accommodating
The third type is text type-restricted theories. The study of text types
such as those discussed by Reiss (1977/1989) shows the functional
characteristics of three text types and how they can be linked to transla-
tion methods. The informative type of text ideally uses plain language
to convey information, facts and so on in a logical way; examples of
informative texts include operating instructions and reports. The
error may still occur and provide a fruitful area of research which could
inform translation practice as well as translator training and the design
of checking tools.
Finally, the sixth type, time-restricted theories – which may be
focused, according to Holmes, on contemporary translations or on
translations from an earlier period – could also be developed using elec-
Figure 2.6 A model of the translation process including pre- and post-editing
tasks
Pre-editing
Pre-editing may entail restricting vocabulary and grammar before the
translation process can take place. It can also simply mean checking the
source-language text for errors and ambiguities (Gross 1992: 98). Based
on the translation process shown in Figure 2.6, an English source-
language text undergoes a pre-editing process to produce a pre-edited
source-language text. This is still in the intralingual stage since the
languages of both texts are still in English. As an example, Figure 2.7
shows an original English text and its pre-edited version, which is easier
to read and understand than the original.
Let the water run hot at the sink and 1 Turn on the faucet at the sink until
then pull the connector from the the water runs hot.
recess in the back of the 2 Pull the connector from the recess
dishwasher. in the back of the dishwasher.
Upon the completion of the above 3 Press down on the thumb release
task, lift the connector to the faucet and lift the connector onto the faucet.
by pressing down the thumb
release.
SL = source language
Post-editing
It is said that the post-editing task is not confined to machine translation
output but also applies to translations produced by human translators.
The term ‘post-editing’ is, however, normally reserved for outputs
generated by machine translation systems (Allen 2001a: 26). Editing a
TL = target language
the future, it is possible that pre- and post-editing tasks may be included
as part of a programme of translation training for those translating tech-
nical texts and using machine translation systems (Koby 2001: 11–12).
Controlled language
A controlled language can be defined as ‘a subset of a natural language
Remove screws holding the blower 1 Remove screws from the blower.
and pull the blower from the 2 Pull the blower from the cabinet.
cabinet. 3 Push a new blower into the cabinet.
Before the screws are installed to 4 Secure the blower with screws.
the blower, a new blower is
pushed back into the cabinet.
The Model ADI-999 Attitude Indicator The model ADI-999 Attitude Indicator
(Photo 8) provides a visual display of (see photo 8) has a display for pitch
pitch and roll attitude and both enroute and roll. This display also includes
Course Deviation Indicator (CDI)/Very these indicators:
Figure 2.10 Example of original English text and its AECMA simplified English
version
Source: Smart Communications, Inc. (2005a).
(q) The lime must be mixed into a (q) Mix the lime with the water
solution before being added to the into a solution before you put
digester because dry lime would settle the solution into the digester.
to the bottom in lumps, which is not Add the solution to the
only ineffective but the lumps take up digester through the scum
digester capacity and are difficult to box. Add a small amount of
remove when cleaning the digester. solution each day until the
Use all the mixing energy available amount of pH (the balance
while liming and thereafter in digester between the acid and the
mixing. The easiest application point is alkaline) and the production of
through the scum box, if one is gas are normal.
2011-04-20
available. Add small quantities of lime NEVER put lumps of dry lime
daily until the pH and volatile acid/ into the digester. These lumps
alkalinity relationship of the tank are fall to the bottom of the
restored to desired levels, and gas digester and are difficult
production is normal. to remove
(h) In any case, use lime only if (h) Use the lime only when
recovery by natural methods cannot be there is not enough time to
accomplished within the time available. correct the liquid in the
digester by other methods.
Figure 2.11 Example of natural English, simplified English and simplified Arabic texts
Source: Smart Communications, Inc. (2005c).
51
10.1057/9780230287105 - Translation and Technology, Chiew Kin Quah
52 Translation and Technology
CL = controlled language
Figure 2.12 Example of an English controlled language text and its translations
language for a highly specialized subject field can be costly and can
take several years of research (see for example The Scania Project at
http://stp.ling.uu.se/~corpora/scania/) as vocabulary, grammar, punc-
tuation and general writing conventions have to be redefined, and
professional translators as well as writers have to be trained to use the
syntactic rules and the terms.
Unedited SL text :
Let the water run hot at the sink and then pull the connector from the recess in the
back of the dishwasher. Lift the connecter to the faucet by pressing down the
thumb release.
Pre-editing [H]
Pre-edited SL text :
1 Turn on the faucet at the sink until the water runs hot.
2 Pull the connector from the recess in the back of the dishwasher.
3 Press down on the thumb release and lift the connector onto the faucet.
Input
Output
Post-editing [H]
Conclusion
Suggested reading
Chesterman, A. and E. Wagner (2002) Can Theory help Translators? Manchester:
St Jerome Publishing.
Melby, A. and C.T. Warner (1995) The Possibility of Language: A Discussion of the
Nature of Language with Implications for Human and Machine Translation.
Amsterdam: John Benjamins.
Munday, J. (2001) Introducing Translation Studies: Theories and Applications.
London: Routledge.
57
‘Toy’ system
Direct approach
Rule-based approaches
Corpus-based approaches
Pioneer years
The pioneer years began in 1949 with the well-known memorandum
from Warren Weaver that effectively marked the beginning of machine
First-generation systems
The first public demonstration of a machine translation system was
the Russian–English Georgetown University System, a collaborative
effort between IBM and Georgetown University, carried out in 1954
(Hutchins 1995: 434). Early machine translation systems, such as the
Russian (Freigang 2001: 20). Since 1995, SUSY can be accessed online
for German–English and Russian–German translation. Research activities
in the Soviet Union concentrated mainly on languages spoken within
the Union itself (Goshawke, Kelly and Wigg 1987: 28). Some other
European countries such as Hungary and Czechoslovakia continued
their research but with limited technological expertise (Somers
Second-generation systems
In the late 1970s, the USA saw a revival of machine translation research
with the development of SPANAM (Spanish American), a Spanish–English
machine translation system, and ENGSPAN (English Spanish), an English–
Spanish system by PAHO as well as METAL (Mechanical Translation and
Analysis of Language), a German-English machine translation system
built by the US Air Force at the University of Texas in Austin with
support from Siemens (Arnold et al. 1994: 14; see also Slocum 1988).
In Europe, between the 1970s and 1992, machine translation research
reemerged with the EUROTRA (European Translation) project based on
the work of the Groupe d’Étude pour la Traduction Automatique
(GETA) in France and the University of Saarbrücken in Germany. This
project covered all the languages spoken in the European Community
at that time. Although it was not successful in building a ‘working’
machine translation system, several EUROTRA-inspired machine trans-
lation systems were developed, for example PaTrans (Patent Translation)
in Denmark, a commercial machine translation system for translating
patent texts from English into Danish, and an experimental machine
translation system involving 13 languages called CAT2 (Constructors,
Atoms and Translators) in Germany. This era lasted until the end of
the 1980s, which saw the emergence of corpus-based approaches (the use
of bilingual or parallel corpora based on statistical- and example-based
approaches), and also the development of new rule-based approaches
using constraint-based grammars (see also Chapter 2).
Machine translation research continued throughout the 1980s in an
attempt to find better methods and techniques for translation. In the
1980s, the most active machine translation research took place in Japan,
Architectures
in the sentence ‘The instant hot air supplies the necessary heat to all
laboratories’ has the structural representations of a verb in the present
tense and in the declarative mood (see Figure 3.2). The grammatical
information is ‘attached’ to the words and phrases of the source-language
text by means of the parsing process. The closer a source language is to its
target language genealogically, for example Italian to Spanish, the less
‘The instant hot air supplies the necessary heat to all laboratories.’
(*a-supplies
(tense present)
(mood declarative)
(punctuation period)
(source (*o-hot air
(reference definite)
(number singular)
(attribute (*p-instant))))
(theme (*u-heat
(reference definite)
(number singular)
(attribute (*p-necessary))))
(goal_to (*o-all laboratories
(reference indefinite)
(number plural)))
Architectures
Dictionaries
Rule-based approaches
Ruled-based approaches involve the application of morphological,
syntactic and/or semantic rules to the analysis of a source-language text
SL representation TL representation
NP NP’
Transfer stage
N1 N1’
beautiful hermosa
N2 N2’
Polish – Hungarian
Polish – Romanian
Hungarian Hungarian
Romanian – Polish
Romanian – Hungarian
Corpus-based approaches
The early 1990s saw corpus-based approaches gaining popularity in
machine translation research. Statistical- and example-based approaches
are two different methods that make use of linguistic information in a
corpus to create new translations. All corpus-based machine translation
systems use a set of so-called ‘reference translations’ containing source-
language texts and their translations. Source and target-language texts
are aligned and the equivalent translation is extracted using a specific
statistical method or by matching a number of examples extracted from
the corpus (Carl 2000: 997).
This approach is not new to machine translation researchers. In the
early 1960s, experiments were carried out at IBM to investigate statistical
methods, but on the whole these were not successful. Another attempt
was made later with a newer stochastic technique called Bayes’ theorem
(Tomás and Casacuberta 2001) that, as a result, revived the use of stat-
istical methods in machine translation research. The example-based
approach was first proposed by Nagao in 1984, but it was not until the
late 1980s that researchers began to employ this method (Trujillo 1999:
204). Corpus-based approaches provide an alternative to the intractable
complexity of rule-based approaches at the analysis and generation
stages (Hutchins 1994).
These two approaches (statistical-based and example-based) were
Aligned bilingual
corpus
SL text TL text
SL words TL words
P(T) × P(T S)
P ( T S ) = ------------------------------------
P(S)
Bilingual corpus
Other probable
words
Most probable
translation:
xxx3
aligned bilingual or parallel corpus while the language model calculates the
probabilities of word sequences from the target language. Only the most
probable translation is usually suggested as the equivalent. Other probable
words can also be tried repeatedly to seek better equivalents if necessary.
These n-gram-based models lack contextual information such as
information on the words surrounding the target words, part-of-speech,
syntactic constituents and semantics. A statistical-based approach also
separates the monolingual and bilingual information. The monolingual
information is located in the language model while the bilingual
information comes from the translation model (Trujillo 1999: 210–11).
The probability calculations used to evaluate a desired target-language
text are vital to this type of approach. The goal is to harvest a list of
possible translation equivalents for a new source segment. In other
words, the task of a statistical machine translation system is to choose
the source-language segment from the corpus that is the closest to the
new source-language segment based on probabilities.
This approach is, however, not without problems. If the bilingual
corpus is too small, the system may not be effective in generating
good translations. The Candide machine translation system, for instance,
has so far worked well in an experimental environment, but it is unsure
if it will perform as well in a commercial environment. The move from
an experimental to a commercial environment needs to be considered.
Since 1994, attempts have also been made to include knowledge derived
from linguistics in the Candide machine translation system. This has been
shown to produce more successful results than the simple use of statistical
methods (Bel et al. 2002).
Aligned bilingual
corpus
TL language
model
Example ES3.1: The lady in the farmers’ market MS3.1: Wanita di pasar tani
is my cousin. itu ialah sepupu saya.
Example ES3.2: She sells flowers every day. MS3.2: Dia menjual bunga
Example ES3.4: Tigers born in the zoo for MS3.4: Harimau yang lahir sejak
the past 10 years were all males. 10 tahun yang lalu semuanya
jantan.
Example ES3.5: The villagers who attended MS3.5: Orang kampung yang
the meeting were all males. menghadiri mesyuarat itu
semuanya lelaki.
New ES3.6: Sarah’s puppy is a male. MS3.6: Anak anjing Sarah jantan.
New ES3.7: The cooks are all males. MS3.7 Tukang masak semuanya
lelaki.
(see also examples in Turcato and Popowich 2003). This can be used to
determine which target-language alternative is most suitable for the
translation of a noun such as ‘male’, that is potentially polysemous
when translated into a language such as Malay which distinguishes the
words used to indicate gender for humans and for animals. Unlike in
English, the specific terms to indicate a male animal (‘jantan’) and a male
English text: The famous skeleton from Indonesia nicknamed the ‘Hobbit’ does
not belong to a modern human pigmy with a brain disease. The study of the brain
supports the idea that it might be a new kind of dwarf, which is one of the human
species.
General 18 31 7 12 68
Home 8 36 9 16 69
Conclusion
Suggested reading
Arnold, D., L. Balkan, R. Humphreys, S. Meijer and L. Sadler (1994) Machine
Workbenches
93
of terms from specific subject fields, to which new terms can be added.
Clients are also known to supply their translators with terms often referred
to as ‘legacy data’, although the data may be presented in different
applications including word-processing software, spreadsheets or other
databases and structured in different ways.
Generally, a database of terms is known as a ‘termbase’; the tool which
Characteristics
A translation memory system has no linguistic component, and two
different approaches are employed to extract translation segments from
the previously stored texts. These are known as perfect matching and fuzzy
matching. Other characteristics such as filter, segmentation and alignment
will also be discussed.
Table 4.3 Higher and lower threshold percentages for fuzzy matching
A lower threshold means that there is less similarity between the old
and new source-language segments (see ES4.9 and ES4.10), with more
work for the translator to do. In some cases, more time is needed to edit
fuzzy matches than to translate them from scratch.
Segments that mean the same thing but differ in format such as dates
(30 October 1961/October 30, 1961/1961, October 30), measurements
(kg/kilogram), time (4.00pm/1600) and spellings (color/colour) all fall
in the fuzzy-match category although they are differently categorized
by Austermühl (2001) and Bowker (2002). Some systems also allow for
the automatic processing of such changes. Examples of English–German
fuzzy matches can be found in Esselink (2000) and Austermühl (2001), and
English–French in Bowker (2002).
Polysemous and homonymous words, that is homographs, always
need careful handling and present a challenge for all machine translation
systems. However, in a computer-aided translation system, a translator
can decide to accept or reject a match – either perfect or fuzzy – when it
is suggested by the system (Bowker 2002: 97). Table 4.4 illustrates just
such a case. Although three suggestions ‘proa’, ‘arco’ and ‘laço’ for ‘bow’
are given in the fuzzy matches, only ‘laço’, in ‘They tie the rope around
the tree in a bow’ (ES4.14) would be selected.
Most translation memory systems have the perfect matching feature.
However, a translation memory system that has the fuzzy matching
feature will enable a translator to optimize the use of previously
translated material by adjusting the threshold accordingly.
Old ES4.11: The big wave has PS4.11: A onda grande danificou
damaged his bow and stern. a proa e a popa.
Old ES4.12: My music teacher PS4.12: O meu professor de
Filter. Some translation memory systems are equipped with filters for
the more common formats. A filter is a feature that converts a source-
language text from one format into another giving a translator the
flexibility to work with texts of different formats (Esselink 2000: 362).
A translation-friendly format contains only written text without any
accompanying graphics. In order to obtain such a format, an import
filter would separate a text from its formatting code. For example, a
web document can be formatted with HTML code which is normally
hidden from the end-user when browsing the web (to view the code,
select the ‘Source’ option from the ‘View’ menu). The code marks the
beginnings and ends of paragraphs, headings, text formats such as bold
and italics, the position of graphs and links, so that the document
assumes a certain appearance on screen. HTML is one of a number of
so-called ‘markup languages’ to which we return later in this chapter.
The HTML code for a web page is shown in Figure 4.1.
If the translator works on the document in the HTML format, there is a
danger that the code might accidentally get removed or translated as part
of the text, giving an incorrect translation. Furthermore, the translation
might not then allow conversion back to a web page owing to the
missing code. Therefore, when a web page requires translation, to make
the translation task easier the page is usually stripped of the HTML code
leaving only the text without any graphics or formatting information, as
shown in Figure 4.2.
<!-- Side bar structure ends here --><!-- End of Math Side Bar --
><!-- Allow menubars in non-printable versions --><!--
Secondary Menubar handling--><!-- End of Secondary Menu
Bar Handling -->
<div id = "CreatorContent"><!-- Content Section begins here --
Segment English
texts to the memory, either by creating a new one, for example for a
new subject field or new client, or by adding to an existing one. Transla-
tion units are usually numbered or tagged as shown in Table 4.6 (see also
Table 4.5). The collection of translation units is stored, in no particular
order, in the database for future translations. Most commercial alignment
tools allow alignment at the sentence level. However, in recent years the
attention of researchers has also focused on alignment methods for
translation memory systems below the sentence level (see Piperidis,
Papageorgiou and Boutsis 2000).
Reviews of specific translation memory systems can be found in
Esselink (1998), Benis (2003), Környei (2000), Austermühl (2001),
Gerasimov (2002) and Wassmer (2004). Helpful sources for the latest
information on translation tools and resources can be found on the web
pages of the Translation Journal (see http://accurapid.com/journal/), and
Multilingual Computing, Inc. (see http://www.multilingual.com/).
SL text
TL text
Translation in Malaysia has never been an important part of the planning of the
modern Malay language. The Terminology Committee set up to deal with the
borrowing of foreign words into the Malay language only focused on scientific and
technological terms. However, one persistent problem since 1973 has been the
translation of English affixes into Malay. Until today, Malaysian translators are facing
problems translating English affixes.
The remaining English segments which were not found in the database
have to be translated manually by the translator. This is shown in Figure 4.3
as working file A. At this point if a search for terms is required, the termi-
nology database can be accessed. The English–French translation units
which have just been translated are then stored in the database to generate
a second pre-translation (see Figure 4.3). This translation, the first draft of
the target-language text in French, then requires revision by the translator.
This is indicated in Figure 4.3 as working file B. At this point, the termi-
nology database can be re-accessed if needed. After the completion of the
translation task by the translator in working file B, a target-language text
in French is produced, which may undergo further revision by the trans-
lator to produce a polished translation (see Puntikov 1999; Zerfass 2002).
The principal workflow seen in Figure 4.3 is reflected in almost all
translation memory systems, but strategies can follow two models: data-
base and reference (Zerfass 2002). The model shown in Figure 4.6 has a
component that stores all previously translated material in one database.
The segments are context-independent, which allows matching to occur in
different translation contexts. Segments from a new source-language
Translation create
database
store New TL text
text are compared to segments in the database, and translations are offered
to the translator if identical and/or similar segments are found. Once the
translation is completed, a new target-language text is produced and the
new or revised segments are added to the database.
In the reference model, the translation database shown in Figure 4.7 is
empty until relevant source and target-language texts are loaded into it in
Stage 2
Stage 1
Parallel corpora
A corpus in the present context is a collection of written texts in a
machine-readable format. In Translation Studies and linguistics, two
terms are used to refer to corpora which consist of original texts and their
translations: ‘parallel corpus’ and ‘translation corpus’. In the field of
computational linguistics the term used is ‘parallel texts’ (see Véronis
2000). Other design possibilities include corpora which consist of texts
in two or more languages and are selected according to similar pre-
determined design criteria, for example size, domain, genre and topic.
This type of corpus has been called a ‘multilingual corpus’ or a ‘comparable
corpus’ in Translation Studies. Multilingual corpora cannot, however,
be aligned as there is no source text–target text relationship. However,
this type of corpus is rich in useful information for translators (Bowker
2002: 46). The final type is the ‘comparable corpus’, which consists of
texts in one language, but offering a comparison between original texts
and translations into that language. Comparable corpora are useful for
researching possible differences between original texts and translated
Text A Text A’
Pre-processing
+ +
Alignment
Segmentation Alignment of
parallel corpus
Lexical analysis
Lima NUM ekor NUC gajah NNN liar ADJ telah AUX memusnahkan VVV dua puluh
NUM ekar NNN ladang NNN getah NNN di PRE Johor NPR. Pengurus NNN ladang
NNN itu DEM menghubungi VVV Jabatan Haiwan dan Hidupan Liar NPR untuk
PRE menangkap VVV gajah NNN liar NNN ini DET.
Concordancers
A concordancer is an electronic tool which has been used in language
learning, literary analysis, corpus linguistics, terminography and
lexicography. It allows the user to select a particular word or phrase
and displays the uses of that word or phrase in the selected corpus in
order to show where and how often it occurs, and in what linguistic
contexts it appears. The output is called a concordance. The concorded
word is shown in the centre of each line displayed in the concordance,
so that the user can quickly scan the results. The example in Figure 4.11
shows two words appearing on the left and right of the concorded
word ‘round’. Other examples of concorded words can be obtained
from the Collins WorldbanksOnline English website (see http://
www.collins.co.uk/Corpus/CorpusSearch.aspx) which has a corpus of
56 million words of contemporary written and spoken text in the
English language.
Concordancers usually allow the user to define the number of words
which they want to appear to the left and to the right of the concorded
word or phrase and to sort the results in various ways, for example
according to frequency or alphabetically according to the word immedi-
ately to the left or to the right of the concorded word. The tool has been
applied to areas of study such as translation, language engineering and
natural-language software development (see Wu et al. 2003). Concordances
were originally done by hand to show the use of all the words in the
Bible (see Tribble and Jones 1997).
While concordancers are strictly speaking used to produce concord-
ances, such tools often have other functions, including typically the
production of indexes (referenced lists of words from the selected
corpus showing where they occur and their frequency distributions)
and wordlists, which are like indexes without any indication of text
Localization tools
Figure 4.12 shows an example of how different types of tool fit into
the workflow of a localization process, divided here into two parts. The
first part involves the planning and management aspects of the process
while the second involves the translating aspects. The management and
translation tools are displayed individually to show their use. In order
to manage several languages at any one given time for the same source-
Operational/management tools
Document/graphic Testing/
design validation
Translation tools
Translation
Terminology Translation
management memory
Alignment Localization
related to the subject field of the material may also need to be collected;
translators may be required to get training if they are unfamiliar with
the subject or with hardware or software applications. In order to prepare
the source-language material of the translation proper, it undergoes a
process called ‘localization-enablement’ or ‘internationalization’. This
process entails, for instance, stripping all graphics from the text which
with speech dubbed into Chinese. The Xbox documentation for the
Singapore market has been translated into the Simplified Chinese script,
while the Traditional Chinese script has been used for the Taiwan market.
The second issue is cultural. Icons, product names, colours, speech
and possibly sound effects have to be adapted to suit local users. At
times product names have to be changed for cultural reasons. For
Pre-edited
SL comparing Translation pre-editing
SL text
text database
The increased demand for technical translation has been the catalyst for
the widespread use of translation tools. Unlike machine translation
systems, computer-aided translation tools are language-independent,
allowing professional translators to use them regardless of the languages
they work with. Table 4.8 is a classification of tools currently available
on the market.
The usefulness of these tools varies greatly in accordance with the
translator’s needs. For example, if localization is only a small part of a
translator’s work, a translation memory system may be sufficient. On
the other hand, if localization is the only translation work a translator
does, a specialized localization tool would be needed in addition to
Electronic dictionary 1 15 55 36 57
Translation memory 39 39
2011-04-20
Alignment 1 1
Localization 39 39
Terminology management system 22 22
Total 1 16 55 37 100 159
In this chapter we have seen that translation and terminology data can
be stored using a large number of tools in a variety of formats using
different operating systems. A similar situation is also found in the
localization industry where numerous types of tools are used. In order
for the translator to have access to translation and terminological resources
that are stored in different translation, terminology and localization
tools, several attempts have been made to support the transfer of data
between these tools – in other words, to introduce interoperability.
These attempts have led to the creation of a number of standards.
A standard is a universal format that has been agreed and approved
by either an international standards organization such as ISO or the
relevant industry such as the localization industry. In the case of data
exchange, the aim of a standard is to facilitate exchange using a common
markup language to structure the data in each document using a set of
agreed tags as annotations (see, for instance, the HTML code in Figure 4.1
above). XML (eXtensible or eXtensive Markup Language) is one such
standard – developed by the WWW Consortium (W3C) – which describes
the structure of different types of electronic documents and hence facil-
itates the sharing of data between different software applications in a
consistent way. It does not contain a fixed set of elements like HTML.
Essentially, XML defines ‘what the information is’, while HTML defines
‘what the information looks like’. XML is an abbreviated version of
SGML (Standard Generalized Markup Language), an international
standard for describing the structure of electronic documents.
While XML is one standard in widespread use, there are different
standards or sets of standards for data interchange which are of
particular interest to any professional translator: the standard for the
interchange of translation memory data, the standards for the interchange
of lexical and terminological data, and the standard for the interchange
of localization information. These are, in turn, Translation Memory
eXchange (TMX), TermBase eXchange (TBX), and XML Localisation
Interchange File Format (XLIFF), and are all described below.
Database 1 Database 2
<body>
<tu
tuid = “0001”>
<tuv
xml:lang = “en-BR”>
<seg>The <bpt i = “1” x = “1”>{\b </bpt>fat<ept
Termbase exchange
Each day new terms are created around the world in various languages
to provide inventions, discoveries and new conceptualizations with
linguistic labels. For many years the terminology community has been
developing term banks or termbases to store and manage these terms.
However, termbases are created in different formats, that is using
different subsets of possible information categories such as linguistic
data, examples, definitions, sources, administrative data, and so on, and
using different data structures, that is the information is differently
organized or distributed between the different fields in the database.
Nevertheless, they may contain similar although not necessarily iden-
tical information. When users access these different termbases, a
problem of incompatibility may arise. Thus, a standard is needed to
provide a format that can facilitate access to all termbases regardless of
how they are stored.
<body>
<termEntry id = ’BWH1’>
<descrip type = ‘subjectField’>civil engineering</descrip>
<descrip type = ‘definition’>A wide layer of load-bearing material laid at the
bottom of a wall or column so as to distribute its pressure more widely over the
Localization exchange
Not infrequently, many technical and logistic challenges arise during
localization, and one such challenge relates to the problems of transferring
texts between different translation and localization tools. Texts are often
stored in different file formats, some of which are proprietary, for
example reports belonging to a company, while others are commonly
shared such as HTML files. These files are not necessarily easily transferable
from one tool to another. In order to eliminate such challenges, a
standard called XLIFF (XML Localisation Interchange File Format) has been
developed by OASIS (Organization for the Advancement of Structured
Information Standards). OASIS is the largest independent, not-for-profit
consortium in the world that is dedicated to overseeing the standardiza-
tion of XML applications and web services. It has about 150 companies
and individuals in the localization industry as members including
Alchemy Software, Hewlett-Packard, IBM, Oracle, Microsoft, Novell, Sun
Microsystems and Tektonik.
Tools: Tools:
Conversion Linguistic Translation Conversion
& filters Localization & filters
Documents Documents
Using filters, the text and layout in documents in HTML format are
separated. The section with translatable text is converted into an XLIFF
format while the non-translatable data (the layout) is kept in a separate
file. An XLIFF file consists of one or more file elements and each of
these has a head and a body section similar to TMX and TBX standards.
A head section contains information about the text such as project
number, contact information and so on, as shown in Figure 4.20.
A body section contains the main elements where localizable or
translatable texts are kept. It is called the ‘translation unit’ or <trans-
unit> element in the XLIFF format file. It contains an identity, ‘id’
attribute, to map where a segment is located in the source language
text. Figure 4.21 illustrates this. The trans-unit element has a source
<head>
<project-title>
<body>
<trans-unit id = “n3”>
<source>This is a good exercise.</source>
<target xml:lang = “ms”>Translation of “Ini satu latihan jasmani yang
bagus.”</target>
</trans-unit>
the changes were, who did them, which tools were used to make them,
and so on (OASIS 2003: 13). Once the translation work has been
completed, the XLIFF format file is reconverted into the original file
format (HTML) and the non-translatable portion (layout) of the source-
language text is reincorporated. An XLIFF format contains only one
source language and one target language. It is different from the TMX
Conclusion
Suggested reading
Austermühl, F. (2001) Electronic Tools for Translators. Manchester: St Jerome
Publishing.
Bowker, L. (2002) Computer-aided Translation Technology: A Practical Introduction.
Ottawa: University of Ottawa Press.
Wright, S.E. and G. Budin (eds) (1997) Handbook of Terminology Management. Vol. 1:
Basic Aspects of Terminology Management. Amsterdam: John Benjamins.
——(2001) Handbook of Terminology Management. Vol. 2: Application-oriented
Terminology Management. Amsterdam: John Benjamins.
Esselink, B. (2000) A Practical Guide to Localization. Amsterdam/Philadelphia:
John Benjamins.
Véronis, J. (ed.) (2000) Parallel Text Processing: Alignment and Use of Translation
Corpora. Dordrecht: Kluwer Academic.
129
There are other ways in which a translator can find out about evaluations
of translation memory tools. One is by joining professional translator’s
Stakeholders
Researchers
There has always been a close relationship between researchers and
system developers; the former build prototype systems based on certain
approaches or models (see Chapter 3). A prototype is an experimental
design of a partial or a whole system that is used for testing purposes
before a complete system is built. Experiments based on a variety of
criteria are used to investigate the performance of prototypes at various
stages of development. Usually evaluations performed by researchers
are reported in the form of papers written in academic or research
journals, in books and/or presentations at conferences and workshops.
Amongst researchers, evaluations of this kind serve as a platform for
testing, benchmarking and discussion.
Developers
A developer (an individual or organization) normally decides on which
prototype is to be turned into a complete system. The quality of the
system built must comply with the relevant ISO (International Organi-
zation for Standardization) software standards, which are described
later in this chapter. When a prototype is selected, a detailed study
of economic viability is carried out on the capabilities and limitations
of the system. Based on a series of different methods of evaluation, the
performance of the prototype as compared with systems built by
competitors is also obtained.
Research sponsors
The funds to build prototypes and complete systems are provided by
research sponsors. Usually research organizations, government agencies
and large corporations are the main sponsors. One issue that concerns
a research sponsor is deciding which research project to fund or which
prototype machine translation systems to fund. Ongoing progress
reports play an important role in helping to show that the hypotheses
End-users
End-users are made up of several groups, the major ones consisting of
translators and translation managers. The evaluation criteria that interest
these groups include the ‘hows’ and the ‘whats’ (Trujillo 1999: 254).
The ‘how’ questions include:
Evaluation methods
There is no doubt that evaluation is the driving force behind the devel-
opment of natural-language processing technology (see Hovy, King and
Popescu-Belis 2002b). Previous work shows that evaluations have been
carried out for a variety of reasons by different groups of people and
evaluations from scratch. Over the years, a great deal of literature has
focused on the purposes, criteria and measurements in machine transla-
tion evaluation. Each time an evaluation was required, the evaluation
literature had to be extensively searched to find suitable evaluation criteria,
measurement and methods. Hence the non-existence of a standard
evaluation method available for use and/or adaptation to all evaluators
5. Evaluation 4. Evaluation
conclusion execution
EAGLES
ISLE TEMAA
FEMTI
Framework
manual automatic
during the evaluation and their solutions. At the stage where the
proposal for a standard evaluation methodology and framework was
being elaborated, the end-users in the working environment were invited
to become involved by providing feedback on the proposal. The EWG
worked on the principle that when end-users feel that they are part of
the making of a standard, it is much easier for them to accept it. For the
validation phase, the proposal was subjected to testing in a controlled
actual working environment. The results, in turn, were used to improve
the proposal for the benefit of both end-users and developers. In the
maintenance phase, the refinement of the proposal was carried out by
the EWG with input from end-users and developers. Although not
‘final’, the resulting proposal is at least a ‘consolidated’ one among all
parties involved and will be valid for a period of time until further
modifications or additions are needed. The proposal will continue to be
maintained even after it has been turned into a fully-fledged standard.
The activities of the EWG were published in what is known as the
dissemination phase. Collaborations with related projects with other
bodies were also encouraged (Calzolari et al. 2003: 10–1).
on which system to buy. The answer to the second question may be for
a researcher to evaluate a component in a system or for an end-user to
evaluate the performance of a system (Hovy, King and Popescu-Belis
2002b). FEMTI is made up of two sets of criteria, evaluation requirements
and system characteristics, which are related to these two questions.
The first set defines the intended context of use (types of task, user and
1 Evaluation requirements
2 System characteristics
Conclusion
For some time efforts have been made to create a framework for the
evaluation of natural-language processing tools like machine transla-
tion systems and computer-aided translation tools. The main difficulty
is that evaluation can be performed for a wide range of reasons by
Suggested reading
EAGLES (Expert Advisory Group on Language Engineering Standards) (1996)
Evaluation of National Language Processing Systems: Final Report. At http://issco-
www.unige.ch/projects/ewg96/index.html.
Lehrberger, J. and L. Bourbeau (1988) Machine Translation: Linguistic Characteristics
of MT Systems and General Methodology of Evaluation. Amsterdam/Philadelphia:
John Benjamins.
Sparck Jones, K. and J.R. Galliers (1996) Evaluating Natural Language Processing:
An Analysis and Review. Berlin: Springer-Verlag.
Van Slype, G. (1979) Critical Study of Methods for Evaluating the Quality of Machine
Translation: Final Report. At http://www.issco.unige.ch/projects/isle/van-slype.pdf.
152
We can recall that attempts have been made with a new generation of
corpus-based machine translation systems to replace the older rule-based
systems but that this has met with limited success. It is important here
to be aware that corpus-based approaches should not be viewed as an
alternative but as a complement to rule-based approaches (see Chapter 4).
Although the advantage of corpus-based approaches is that they are
In the past few years, speech technology has attracted the attention of
natural-language processing researchers especially in Canada, Europe,
Japan and the USA. Their general aim is to provide a technology that is
not only able to convert speech into text, and text into speech, but
also speech into speech within the same language or between different
Text Speech
Translation Interpretation
The translation industry, like any other, is not spared the effect of global
changes. The world is increasingly technologically driven, opening up
new possibilities, opportunities, needs and demands. Information is
becoming more flexible and fluid via the electronic medium. On the
web, a multilingual global community has access to information which
in turn requires translation. A multilingual environment on the web
promotes many things, from products and services to understanding of
and communication between different ethnic communities. As the speed
of telecommunication increases, fast translation is a service offered by
many translation companies and professional translators over the
web. Hence, the electronic nature of much translation work and of
communication between translators and their clients has also resulted
in information which was at one time preserved only on paper now
being stored digitally.
A multilingual marketplace such as the web caters for the language
needs of different groups of end-users. Demand for information access
and retrieval has made online machine translation systems almost
indispensable, allowing many end-users to obtain almost instantaneous
translations, although they are often of poor quality. For individual
users, free online machine translation services are the window on
another cultural and linguistic world. For corporate end-users, gathering
<sentence>
<person href = “http://quah.com/”>I</person>have two<animal>
cats</animal>.
</sentence>
The computer now knows that ‘I’ refers to a ‘person’ and ‘cats’ refers
to a type of ‘animal’. XML enables information in RDF (see below) to
be exchanged, for instance, between computers that use different
operating systems.
• Resource Description Framework, a framework for describing and
representing information on the web so that a computer can read
and understand it (Vertan 2004). It is used to describe features on a
web page, for example properties such as price and other information
such as its author. Information on the web written in RDF, which is
also known as an ‘RDF statement’, is annotated in ‘triples’, in the
structure of subject–predicate–object. The subject is the resource that
is being described, the predicate is the property of the thing that is
being described, and the object is the value of that property. This can be
written, for example, as: ‘The author of http://www.pkmstats.com/ is
Paul Marriott’ where the subject is ‘http://www.pkmstats.com/’, the
predicate is ‘the author’ and the object is ‘Paul Marriott’.
• Web Ontology Language (OWL), an extension of RDF, has a larger
vocabulary and stronger syntax than RDF. Since early 2004, OWL
has been a web standard as recommended by W3C (see http://
www.w3schools.com/rdf/rdf_owl.asp). According to Vertan (2004)
this annotation in the Semantic Web helps example-based machine
translation systems in three ways: creating additional sources of
sure that the individual doing the localization work is skilled and
trained.
The world we live in today is increasingly dependent on the web, in
particular through the ways in which people communicate, informa-
tion is stored and business is conducted. The high demand for rapid
translation and the localization of services and products constitute a
Conclusion
Suggested reading
Branchadell, A. and L.M. West (2004) Less Translated Languages. Amsterdam:
John Benjamins.
Carl, M. and A. Way (eds) (2003a) Recent Advances in Example-based Machine
Translation: Text, Speech and Language Technology. Vol. 21. Dordrecht: Kluwer
Academic.
Cronin, M. (2003) Translation and Globalization. London: Routledge.
O’Hagan, M. and D. Ashworth (2002) Translation-Mediated Communication in a
digital World. Clevedon: Multilingual Matters.
Schäffner. C. (ed.) (2000) Translation in the Global Village. Clevedon: Multilingual
Matters.
172
Automation
Automation can be defined as the activities of machines that are self-
acting or without human supervision. The different perspectives which
we will be considering under this heading – each in turn – are the
degree of automation, the level of human intervention and possible
system combinations. We start in Table 7.1 by looking at how the
different types of translation are related to the degree of automation,
ranging from fully automated to non-automated. According to our
definition, only machine translation is fully automated, while human-
aided machine translation and computer-aided translation are partially
automated to different degrees, depending on the level of human
involvement. As for human translation, translators may or may not
be using tools to assist their translation work, again blurring the
MT HAMT CAT HT
Specific General
Fully automated Y Y N N N
Partially automated N N Y Y Y
Non-automated N N N N Y
Y = yes; N = no
MT HAMT CAT HT
Specific General
MT HAMT CAT HT
Specific General
System, and not all commercial systems include the same combination
of tools.
At present, most localization tools have yet to be integrated with
machine translation and human-aided machine translation systems. In
some localization processes, machine translation systems have been
used as separate tools to produce the first drafts of target-language texts.
Theory
In earlier chapters, the application of theory – both translation and
linguistic – to the development of machine translation and other
systems was discussed in some detail. This section looks at theory from
two perspectives, firstly in relation to all four basic translation types
(Table 7.4), and secondly specifically in relation to machine translation
and its various sub-types (Table 7.5).
Based on the literature consulted for this book, there was no evidence
of translation theory being used in the development of machine trans-
lation, human-aided machine translation or computer-aided translation.
The relevance of translation theory to professional translators in their
daily work is a controversy of long-standing (see Chesterman and
MT HAMT CAT HT
Specific General
Translation theory N N N N P
Linguistic theory Y Y Y Y P
Translation N N N N N
theory
Formal linguistic M Y Y N Y
theory
Texts
In this section we look at the four types of translation from the point of
view of the texts involved in the translation process, that is the source
texts and their translations. We first consider the relevance of various
types of editing to source and target texts (Tables 7.6 and 7.7), before
MT HAMT CAT HT
Specific General
MT HAMT CAT HT
Specific General
Post-editing:
Rapid R R R n/a n/a
Polished R MR R n/a n/a
Revision n/a n/a n/a R R
MT HAMT CAT HT
Specific General
Pre-editing I LI I LI LI
Interactive n/a n/a I I I
Post-editing I LI I n/a n/a
MT HAMT CAT HT
Specific General
Highly creative NS NS NS NS S
Semi-creative NS NS NS NS S
General-purpose NS S P S S
Semi-technical S P S S S
Highly technical S NS S S S
Language dependency
Some tools are designed for specific languages. Spell-checkers are an
obvious example, as are also electronic dictionaries and glossaries.
Others, such as translation memory systems and concordancers, can be
used with any language, assuming that the relevant character sets are
digitally available. Such tools do not have any content except the input
provided by the user. Any ‘knowledge’ that a translation memory
system, for example, contains, such as ‘this unit in language X is the
equivalent of that unit in language Y’, is based on the translator’s own
input, either through the alignment of previously translated texts and
their sources using an alignment tool, or through the new units added
to the translation memory database as the translator is working. So the
translation memory system itself has no knowledge of any particular
language. For example, a translation memory system such as the
Heartsome Translation Suite developed by Heartsome Holdings Pte. is
capable of handling an unlimited number of languages.
Table 7.10 reviews each translation type with respect to their degree
of independence from particular languages.
Machine translation systems, human-aided machine translation systems
and human translation are clearly highly language-dependent. The
systems are usually developed for the translation of specific language
pairs. The fact that some machine translation and human-aided
machine translation systems offer translation between many language
pairs does not alter the fact that they are still language-dependent.
TranSphere™, for example, is a multilingual machine translation
MT HAMT CAT HT
Specific General
Language-dependency H H H H/L H
MT HAMT CAT HT
Specific General
Controlled language Y P Y Y P
Natural language N Y P Y Y
created to help the operational flow when several tools are used
concurrently in a translation process. Tables 7.12 and 7.13 focus on
this aspect but from different perspectives.
Table 7.12 shows how important standards are for each translation
type, referring to three different types of standard: TMX, TBX and
XLIFF, as described in Chapter 4. Within machine translation, the need
MT HAMT CAT HT
Specific General
Professional translators VI VI P
Translation companies VI VI VI
Localization industry VI VI VI
Researchers I I I
Developers I I I
Two groups that are concerned with standards but not as end-users
are researchers, who are involved in the development of standards
before publication, and developers, for whom it is important to ensure
compliance with these accepted standards in order for tools to be
marketable. For example, some newer versions of translation memory
systems must obtain certification for TMX before they are released onto
Evaluation
Our last topic in this section is the evaluation of translation tools,
another area of importance to different groups involved in the transla-
tion industry, including translation companies, professional translators
and developers. Evaluation procedures are designed to ensure that the
tool developed performs as expected by developers and by users, that is
translators. Tables 7.14 and 7.15 focus respectively on the level of system
evaluation required as dictated by translation type, and the methods of
evaluation for each level as previously discussed in Chapter 5.
We know from Chapter 5 that the evaluation of translation tools can
be performed in two principal ways, namely by looking at particular
components in a system or by evaluating the whole system. We can also
recall that component evaluation means that either a single component
or several components in a system are evaluated at any one time while a
tool is in the developmental stage. A whole system evaluation, on the
other hand, encompasses the performance of the whole tool, once
development has been completed. Both types of evaluation are important
to all translation types involving computer tools.
‘Evaluation’ is a term applied to the assessment of translation output
from automated systems; so evaluation in this sense is not applicable to
the work of human translators. There are other ways of trying to ensure
MT HAMT CAT HT
Specific General
that human translation meets certain quality standards and these are
largely institutional, for example through translation courses and accred-
itation. In many countries, translation courses are offered at universities.
Accreditation, on the other hand, is normally professionally certified
based on qualifications, experience and/or direct assessment to confirm
competence in translation and/or interpreting. Typically, accreditation
Human Y Y
Automation Y Y
Test suite Y N
Test corpus Y N
Glass-box Y N
Black-box N Y
Y = yes; N = no
Algorithms N N N Y Y
Examples N N N N Y
Dictionaries Y Y Y N N
SL analysis M Y Y N N
TL synthesis M Y Y N N
Abstract N Y Y N N
representations
Transfer N N Y N N
module
Language N N N Y Y
model
Translation N N N Y N
model
Modularity N Y Y Y Y
Corpora N N N Y Y
Y = yes; N = no;
Highly creative U NU VU
Semi-creative U NU VU
General U NU VU
Language-dependency L L H/L
H = high; L = low
long as the character sets for that language are supported digitally. For some
translation and localization tools, the question of language-dependency
only arises when part of, or the entire character set of a language is not
digitally supported, as in Javanese, spoken mainly in Indonesia. Linguistic
tools such as electronic dictionaries and thesauri, on the other hand, are
highly language-dependent since they are content-based, whereas linguistic
Conclusion
This chapter has been written with the intention of providing a basic
summary of each and every topic covered in the book. In some cases,
what is presented is cutting-edge information and it is not unlikely that
in the near future changes will occur as technology becomes increas-
ingly sophisticated and new technologies are introduced. As we have
seen in Chapter 6, it is also not an easy task to illustrate translation
technology that, with each passing day, is becoming increasingly complex.
For this reason, some of the tables presented above deal with overlapping
topics. They contain similar criteria and touch on similar issues but each
197
(Continued)
Monolingual
Bilingual
Al Misbar http://195.217.167.3/dict_page.html E, A
http://www.almisbar.com/dict_page_a.html
Albi http://www.argjiro.net/fjalor/ Al, E
Bhanot http://dictionary.bhanot.net/ E, M
Capeller . http://www.uni-koeln.de/phil-fak/indologie/ San, E
tamil/cap_search.html
Ceti http://www.ceti.com.pl/~hajduk/ Bel, E
Danish-Jpn Dic http://www.fys.ku.dk/~osada/djdict/djdict.html J, Da
Darkstar http://darkstar.sal.lv/vocab/index.php Oj, E
E-Est Dic http://www.ibs.ee/dict/ E, Est
E-Fi http://foto.hut.fi/sanasto.html E, Fi
E-H http://consulting.medios.fi/dictionary/ E, H
En-Romanian http://www.castingsnet.com/dictionaries/ E, Ro
Francenet http://www.francenet.fr/~perrot/breizh/ Bre, E
dicoen.html
Galaxy http://galaxy.uci.agh.edu.pl/~polak/slownik/ Po, E
Gr-E http://users.otenet.gr/~vamvakos/alphabet.htm Gr, E
Hebrew Dic http://www.dictionary.co.il/ He, E
Islandes http://www.fut.es/~mrr/islandes/islandes1.html Ic, Ca
Kamus Jot http://www.jot.de/kamus/ G, In
Kihon http://kihon.aikido.org.hu/dict.html H, J
Learning Media http://www.learningmedia.co.nz/ngata/ Ma, E
index.html
Lexiconer http://www.lexiconer.com/ecresult.php E, C
Lexitron http://lexitron.nectec.or.th/ Th, E
Lingresua http://lingresua.tripod.com/cgi-bin/onlinedic.pl E, U
Persian Online Dic http://www.wdgco.com/dic/ Pe, E
Spanishdict http://spanishdict.com/ S, E
TechDico http://membres.lycos.fr/baobab/techdico.html F, E
Potawatomi http://www.ukans.edu/~kansite/pbp/books/ Pot, E
dicto/d_frame.html
Trilingual
Cambridge http://dictionary.cambridge.org/ E, F, S
Cari.com http://search.cari.com.my/dictionary/ E, M, C
Multilingual
Monolingual
Multilingual
Monolingual
Biochemistry http://www.fhsu.edu/chemistry/twiese/ E
glossary/biochemglossary.htm
BioTech http://filebox.vt.edu/cals/cses/chagedor/ E
glossary.html
Classical mythology http://www.classicalmythology.org/ E
glossaries/
Genome http://www.ornl.gov/sci/techresources/ E
Human_Genome/glossary/
Medwebplus http://www.medwebplus.com/obj/25888 E
Bilingual
Trilingual
Reiterin http://www.reiterin.ch/l/lexikon.htm E, F, G
Multilingual
(Continued)
Medline http://www.nlm.nih.gov/medlineplus/encyclopedia.html E
Natureserve http://www.natureserve.org/explorer/ E
A7 Language key
/ = bidirectional
- = unidirectional
204
Holmes, J.S. (1988/2000) ‘The Name and Nature of Translation Studies’, in The
Translation Studies Reader , L. Venuti (ed.) (2000). London: Routledge: 172–85.
Hovy, E., M. King and A. Popescu-Belis (2002a) ‘An Introduction to Machine
Translation Evaluation’, in Workbook of the LREC2002 Workshop on Machine
Translation Evaluation: Human Evaluators meet Automated Metrics, M. King (ed.).
Spain: 1–7.
——(2002b) ‘Principles of Context-based Machine Translation Evaluation’,
Jurafsky, D. and J.H. Martin (2000) Speech and Language Processing: An Introduction to
Natural Language Processing, Computational Linguistics and Speech Recognitions,
New Jersey: Prentice-Hall.
Kaji, H. (1999) ‘Controlled Languages for Machine Translation: State of the Art’,
Proceedings of MT Summit VII, Singapore: 37–9.
Kaplan, R.M. and J. Bresnan (1982) ‘Lexical-functional Grammar: A Formal
System for Grammatical Representation’, in The Mental Representation of Gram-
of LREC 2000 Satellite Workshop: Using Evaluation with HLT Programs: Result and
Trends, Greece, http://www.sfu.ca/~anoop/papers/pdf/hlt-eval.pdf. November
2003.
Pugh, J. (1992) ‘The Story so Far: An Evaluation of Machine Translation in the
World Today’, in Computers in Translation: A Practical Appraisal, J. Newton (ed.)
(1992a). London: Routledge: 14–31.
Puntikov, N. (1999) ‘MT and TM Technologies in Localization Industry: The
218
general-purpose machine translation language pairs, 14, 34, 40, 66, 69–70,
systems, 66, 91, 173, 179–83, 186 72, 74, 86, 108–10, 132, 150, 154,
general-purpose texts, 173, 166, 173, 184–5, 190–3
183–4, 194 lexicography, 57, 105, 111
linguistic component, 31, 35, 60, 95
header, 121, 124 linguistic phenomena, 29, 40,
high-quality translations, 7, 45, 60, 136–7, 190
terminology, 104–6, 122–4, 139, 183 81, 83, 85, 94, 103, 115, 117, 149,
terminology databases, 13, 38, 94, 153, 173, 178–9, 181–2, 187
103, 106, 115, 117, 123, 132, Translation Studies, 3, 6, 22, 25, 35–7,
149, 179, 188 39, 41, 43, 55, 107–8, 113
terminology management systems, 42, translation technology, 1–6, 8, 18–22,
93–4, 106, 113, 117, 123, 128, 25, 35, 40, 42–3, 47, 55, 152, 155,
132, 170, 193 157, 171, 177, 195–6