ToolBox11
ToolBox11
A Computer Primer
for Translators
by Jost Zetzsche
This document, or any part thereof, may not be reproduced or transmitted electronically or by any other means without
the prior written permission of International Writers’ Group, LLC.
ABBYY FineReader, ABBYY Screenshot Reader and PDF Transformer are copyrighted by ABBYY Software House. Acrobat,
Acrobat Reader, Dreamweaver, FrameMaker, HomeSite, InDesign, Illustrator, PageMaker, Photoshop, and RoboHelp are
registered trademarks of Adobe Systems Inc. Acrocheck is copyrighted by acrolinx GmbH. Acronis True Image is a
trademark of Acronis, Inc. Across is a trademark of Nero AG. AllChars is copyrighted by Jeroen Laarhoven. ApSIC Xbench
and Comparator are copyrighted by ApSIC S.L. Araxis Merge is copyrighted by Araxis Ltd. ASAP Utilities is copyrighted by
eGate Internet Solutions. Authoring Memory Tool is copyrighted by Sajan. Belarc Advisor is a trademark of Belarc, Inc.
Catalyst and Publisher are trademarks of Alchemy Software Development Ltd. Classic Shell is copyrighted by Ivo Beltchev.
ClipMate is a trademark of Thornsoft Development. ColourProof, ColourTagger, and QA Solution are copyrighted by
Yamagata Europe. Complete Word Count is copyrighted by Shauna Kelly. CopyFlow is a trademark of North Atlantic
Publishing Systems, Inc. CrossCheck is copyrighted by idioma. Déjà Vu is a trademark of ATRIL Solutions. Docucom PDF
Driver is copyrighted by Zeon Corporation. dtSearch is a trademark of dtSearch Corp. ExamDiff Pro is a trademark of
Prestosoft. EmEditor is copyrighted by Emura Software inc. Error Spy is copyrighted by D.O.G. GmbH. FileHippo is
copyrighted by FileHippo.com. FileSplit is a trademark of Partridge Software. Flare and Lingo are copyrighted by MadCap
Software Inc. Fluency is copyrighted by Western Standard, Inc. LibreOffice is copyrighted by The Document Foundation.
Logoport and Translation Workspace are trademarks of Lionbridge Technologies, Inc. Fusion is a trademark of Orca
Development Corporation. Google, Google Translator Toolkit, and Google Translate are trademarks of Google Inc.
Heartsome and TMX Editor is a trademark of Heartsome Holdings Pte. Ltd. Insert Togglekey is copyrighted by Mike Lin.
IntelliWebSearch is copyrighted by Michael Farrell. KeyTweak is copyrighted by Travis Krumsick. Language Studio is a
trademark by ATIA Ltd. LF Aligner is copyrighted by András Farkas. Lingobit Localizer is a trademark of Lingobit
Technologies. Lingotek is copyrighted by Lingotek Inc. LINUX is a trademark of Linus Torvalds. LogiTerm, AlignFactory, and
SynchroTerm are trademarks of Terminotix Inc. Worx is a trademark of Language Technology Centre Ltd. Mac and
Macintosh are trademarks of Apple Computer, Inc. MetaTexis is a trademark of MetaTexis Software and Services. memoQ is
copyrighted by Kilgray. Microsoft is a registered trademark and Office, Word, PowerPoint, Access, Excel, Outlook, Publisher,
Visio, Project, FrontPage, Internet Explorer, Windows Plus!, TweakUI, MSN Messenger, Windows Messenger, Windows,
Windows Update and Microsoft Update are trademarks of Microsoft Corporation. Mozilla and Firefox are trademarks of The
Mozilla Organization. multiQA is copyrighted by multiQA.com. MultiTrans is a trademark of MultiCorpora R&D INC. Multi-Edit
is a trademark of Multi Edit Software, Inc. Multilizer is a trademark of Multilizer Inc. Norton SystemWorks Pro, Norton
AntiVirus, and Norton Utilities are trademarks of Symantec Corporation. Notepad++ is copyrighted by Don Ho. Nvu and
BlueGriffon are copyrighted by Linspire, Inc. OmniPage and PDF Converter are trademarks by Nuance Software. OpenOffice
is a trademark of The Apache Software Foundation. Opera is a trademark of Opera Software AS. Original Virtual CD is a
trademark of ZTekware Computing Inc. PDF995 is copyrighted by Software995. PDFCreator is copyrighted by Open Source
Technology Group. Plunet is copyrighted by Plunet GmbH. PKZIP is a trademark of PKWare, Inc. PractiCount & Invoice is a
trademark of Practiline Software. PrimoPDF is copyrighted by activePDF Inc. Quadsucker is copyrighted by S-B Software.
QuarkXPress is a trademark of Quark, Inc. Quicken is copyrighted by Intuit Inc. RC-WinTrans is a trademark of
schaudin.com. SDL Insight, SDLX, SDLPhraseFinder, Synergy, Trados, MultiTerm, ExtraTerm, Passolo, AuthorAssistant and
StoryCollector are trademarks of SDL International. Search and Replace for Windows is copyrighted by Funduc Software Inc.
SendTo is copyrighted by Trogladite Software Group. Similis is copyrighted by Lingua & Machina. Skype is a trademark by
Skype and/or Microsoft. SnagIt is a trademark of TechSmith Corporation. SpywareBlaster is copyrighted by Javacool
Software LLC. Star Transit, TermStar, Star James, and FormatCheckers are trademarks of Star AG. OpenOffice is a
trademark of The Apache Software Foundation. Sysfilter is a trademark by ECM engineering. Teleport Pro Tennyson is a
trademark of Maxwell Information Systems, Inc. Text United is copyrighted by Text United GmbH. Time Stamp is
copyrighted by William Rouck. T.O.M. Translator’s Office Manager is copyrighted by Joachim Voigt. Total Commander is
copyrighted by Christian Geisler. Translation Office 3000, ExactSpent, Projetex, and AnyCount are copyrighted by Advanced
International Translation. Transmissions is a trademark by Transmissions, LLC. Twins File Merger is copyrighted by Twins
Software, Inc. UltraEdit is a trademark of IDM Computer Solutions, Inc. Unicode and the Unicode logo are trademarks of
Unicode Inc. Ventura, WordPerfect Office, WinZip, and Paint Shop Pro are trademarks of Corel Corporation. Verifika is a
registered trademark of Palex Group Inc. VirtualCD is a trademark of H+H Zentrum für Rechnerkommunikation GmbH.
Virtual Drive is a trademark of FarStone Technology Inc. Visual Localize is a trademark of AIT AG. WebBudget and
FreeBudget are trademarks of Aquino Developments S.L. WebWordSystem is a trademark of Web Word System ApS.
Wordbee is copyrighted by Wordbee S.A. Wordfast Classic, Wordfast Anywhere, and Wordfast Pro are trademarks of
Wordfast Ltd. WS_FTP is a trademark of Ipswitch, Inc. Xerox XTS is a trademark of the Temis-Group. XTM Cloud is
copyrighted by XTM-INTL. XTRF is copyrighted by XTRF Management Systems.
All other product names are trademarks or registered trademarks of their respective companies.
Table of Contents
Table of Contents
Introduction 1
The Purpose of This Book 1
How to Read This Book 2
How to Read The Updated Version of This Book 3
Who the Robot with St. Jerome’s Face on the Cover of the Book Is 3
Operating Systems 5
The Benefits of Windows 2000 and Higher 5
Switching to the Windows 8/8.1 Interface 9
Windows/File Explorer 11
Previewing Files 12
Folder Paths 12
Selecting Multiple Files 13
The Ribbon Menu 13
Libraries in Windows 14
Helpful Shortcuts 16
Sending Files to Other Drives or Programs 16
To Search with Wildcards 18
To Copy Files or Folders 21
WinKey Shortcuts 25
Folder and File Structure 26
Controlling Which Programs Are Automatically Started 28
Avoiding the Animated Environment 31
Keeping the Computer Clean 32
The Registry 34
Disk Cleanup 35
Finding the Forgotten Space Hogs 40
Error Checking and Defragmenting Drives 42
Starting the Computer in Safe Mode 43
Restoring Your Computer 44
Backing Up Files 45
File History in Windows 8/8.1 49
Backing Up and Restoring the Complete System 51
Asking for Help 55
Taking Inventory of Your Computer 56
Keyboard Languages 56
Installing Additional Keyboards on Windows 9x-7 59
Installing Additional Keyboards in Windows 8/8.1 64
Mapping Existing Keyboards 65
Web Browsers 69
Browsing Tips 70
Using URLs to Find Translation Data 72
Using Wikipedia for Language Data 73
File Transfer 75
Utilities 117
Graphic Management Utilities 117
Renaming Utilities 120
Search Utilities 122
CD Emulators 128
Compression Utilities 129
Password Cracking Utilities 133
Measurement Conversion Utilities 135
Word Count Utilities 136
Time Tracking 142
Clipboard Management Utilities 144
Merging Files 146
Introduction
As a technical translator and localization consultant, I’ve been continually
surprised at the lack of technical expertise and knowledge about software
tools among many translators and project managers. I’ve seen countless
hours wasted on tasks that could have been done automatically or in a
fraction of the time. And as an editor, I’ve often struggled to improve texts
that were translated with an adequate level of linguistic or subject-matter
expertise, but whose quality was sub-par because the translator didn't know
how to use the necessary tools or formats.
At some point after it became common for translators to use computers for
their work, it seems that many of us became convinced that we were really
not smart (read: technical) enough to become proficient computer users. The
irony is that many of us translate highly technical and complex subject matter
every day. There is no lack of intelligence among us—merely a prevailing not-
smart-enough-for-computers fallacy that we have bought into.
It is time to adopt a new paradigm for our profession: Not only is it acceptable
to use computers well—it is critical to our success.
The specific product names that I feature in the tutorials are not necessarily a
reflection of any favorable judgment on these in comparison with other
competing products. Instead, they represent either the most commonly used
products or the ones that I am most familiar with.
The comprehensive index at the end of the book will help you to quickly find
the information you need. To help you find some of the "tips and tricks" that I
list throughout the book, I have preceded the alphabetical index with a "How
to" section. Because you may not know exactly what you are looking for, I
encourage you to actually read or at least scan through the book.
Finally, read with courage and creativity! Computers and the plethora of
specialized software programs are powerful tools for translation, tools that are
more accessible and affordable than ever before. And with this tool box at
your disposal, the only limits to your craftsmanship as a translator are the
boundaries you set for yourself.
Who the Robot with St. Jerome’s Face on the Cover of the Book
Is
Glad—and yet surprised—you asked. As most of you know, of course, this is
Jeromobot, my little friend and patron saint. I adopted his image to unite our
passion for language and the art of translation—St. Jerome, though
apparently unfit for most human relationships, was very passionate about
languages and felt very strongly about his grand translation of the Christian
scripture into common Latin—with the new era that we live in. This era
requires us to harness Jerome’s intensity and excellence to the power of
modern technology. May Jeromobot be with you and with us all!
Operating Systems
The most important program that runs on a computer is the "operating
system." Operating systems provide a software platform on top of which
other programs, called "application programs," can run. Because the
application programs must be written to run on top of a particular
operating system, your choice of operating system largely determines
the applications you can run.
There are even multi-language versions of these operating systems that allow
you to switch the user interface between 24 (2000), 33 (XP), 30 (Vista), 95
(7) and 109 (8) languages. While the multi-lingual versions for Windows 2000
and XP were difficult to obtain for anyone outside the Microsoft Development
Network (MSDN—see http://msdn.microsoft.com/en-us/goglobal/
bb688178.aspx), they are now offered as an integral part of Windows Vista
Ultimate, 7 Ultimate, and 8/8.1 (all versions).
You can find a list of all the supported languages for the different
versions of Windows at http://windows.microsoft.com/en-US/windows/
language-packs.
To change the display language in Windows 8/8.1, open the Control Panel,
select Language and Add a Language.
It’s going to take a little while for the configuration to work when you install
an additional language for the first time, but when it’s done you can easily
switch the user interface language by opening the Language dialog (see
above), double-clicking on the language you want the user interface to be
displayed in, and selecting Make this the primary language. You’ll be
prompted to log on again (no restart necessary!) and everything will be in the
language of your choice.
You can change to the less task-oriented but more precise classic view of
the Control Panel under Start> Control Panel> Classic View; or in
Windows 8/8.1, View by and select Small Icons. (The instructions in
this primer are all related to the Classic/Small Icons view of the Control
Panel.)
In the Control Panel in Windows 8/8.1 (just type "control panel" in the metro
interface to open it) you’ll find the option Taskbar and Navigation. This will
open the following dialog that will allow you to simply skip the Metro interface:
Figure 3: The Navigation tab in the Taskbar and Navigation properties dialog
You still won’t have the familiar Start menu in Windows, and navigation
becomes a little more difficult when you have only a desktop with no apparent
way to start or control programs. Here I would recommend installing the
third-part tool Classic Shell (see www.classicshell.net). This will give you a
number of options to rebuild the Start menu in your preferred way, including
one like this:
There is one other power option available in Windows 8/8.1 that gives you
access to all kinds of things (including the ability to turn the computer off).
For this you don’t have to install anything extra. You just have to press +X
and you will see the Power User menu with access to all kinds of important
places:
Windows/File Explorer
The most helpful and often-used Windows component on my computer is the
Windows Explorer or File Explorer (Windows 8/8.1).
Starting with Windows Vista, the Explorer has taken on some advanced power
options that you won’t want to miss.
Previewing Files
One is the preview feature which allows you to automatically preview HTML,
MS Office, PDF, graphics, and text files within the Explorer (you need to
enable this by selecting Organize> Layout> Preview Pane, by pressing the
Preview button on the toolbar of the Windows 7 Explorer, or by selecting
Preview pane on the View ribbon in Windows 8/8.1’s File Explorer).
Folder Paths
An element that might be confusing at first but may become one of your
favorite features is the address bar. The path to the selected file is no longer
displayed in the usual manner with backslashes (such as C:\Windows\Fonts);
instead, it is provided through a "breadcrumb trail." This is an interactive
address which, in case of very long addresses, is shortened and provided with
little right-arrows between the different locations on the path. Clicking on any
of those arrows displays all other possible branches that go off from that point
so that you can quickly navigate there.
If you prefer the old-fashioned method of displaying addresses, you will only
need to right-click the address bar and select Edit Address
.
When you previously had to hold the CTRL(+SHIFT) key while selecting multiple
files within Windows Explorer, Windows 8/8.1 has made it easier to select
several files or folders in the File Explorer at the same time to perform one
action simultaneously (such as delete or copy). You can now select little check
boxes to the left of the file name that appear as you select the file.
Windows 8/8.1 has also added to the File Explorer the ribbon menu that most
are familiar with from the last couple versions of Microsoft Office. While it
might seem unwieldy at first, it adds to your productivity by often
automatically displaying interactive ribbons that match your current selection
(see the automatically selected Picture Tools ribbon because of my selection of
graphic files in the image above).
Libraries in Windows
Libraries are like virtual folders. Of course this doesn’t mean that normal
computer folders are physical, but in the traditional computer world, they and
only they "contain" the files that are stored in them. Libraries, on the other
hand, are virtualizations of that. They can display the contents of other folders
from all over your computer, other computers on the network, or even a USB
flash drive. A library is essentially an organizational principle that monitors
other folders and provides a single "location" to work with all their contents.
Out of the box, Windows 7 and 8 (and, if enabled, 8.1) come with four
libraries: Documents, Music, Pictures, and Videos, each with its obvious
content. And again, while the references to those files are stored in the
respective libraries, the actual files stay wherever you stored them on your
computer.
There are plenty of ways you can use libraries to manage, manipulate, or
organize files, but one is particularly helpful: backup.
Note that in Windows 8/8.1, the File History backup system uses the
pre-configured libraries as its center point of backups (see page 50).
To create a new library, simply right-click on the Libraries folder on the left-
hand side of Windows/File Explorer (the Navigation bar) and select New>
Library. Once you give your library a name and open it, you'll be prompted to
add folders from any location you can access from your computer (except
read-only media such as DVDs or CDs). Since you don't want your complete
system to be backed up every night, you can pick and choose the necessary
folders and then schedule the library for your nightly backup (or you could
simply right-click the library before you turn off your lights for the night and
select Send To> <your external hard drive> or whatever you prefer as a
backup device).
Helpful Shortcuts
Sending Files to Other Drives or Programs
To send any file or folder to any drive (or any program)—including your floppy
drive or CD writer—right-click on the file and folder and open the list under
Send to.
These shortcuts are stored in the SendTo folder (under Windows in Windows
9x and ME, Documents and Settings/<user> in Windows 2000 and XP, and
Users/<user>/AppData/Roaming/Microsoft/Windows in Windows Vista, 7,
and 8/8.1). You can delete any of the existing shortcuts or add links to any
program you would like to have listed there. In the above example, I added
shortcuts to Word and Excel in the SendTo folder so that I can open all
possible files in these programs through a right-click.
Another helpful way to open files quickly in many programs (especially Office
and desktop publishing applications) is simply to drag the file into the open
program while no other file is displayed. When the cursor with the file is
located over the dark grey background, a plus symbol will be displayed.
Releasing the mouse cursor will then open that file in the appropriate
program.
If you are interested in opening more than one application if you want to start
work on a specific project (such as a browser, a voice recognition program,
and a translation tool), all you need to do is open Notepad or another text
editor and type something like this:
Once you entered the text, save the file as a *.bat file. The BAT extension tells
Windows that this is a batch file that contains a stack of commands that it
needs to execute when the file is opened. To open it you'll only need to
double-click on it and (in this case) Firefox, Dragon NaturallySpeaking, and
XBench are automatically started. Clearly you are probably interested in other
programs, so all you need to do is replace the path to the program within the
quotation marks and enter the file name of the program afterward. Save it as
a *.bat file and you're ready to start the day.
A wildcard is a special symbol that stands for one (?) or more (*)
characters. This means that a*b could be any combination of characters
starting with the letter a and ending with the letter b, whereas a?b can
only be a three-character combination starting with a and ending with b.
Wildcards in file searches are very powerful. Right-click on any folder, select
Search, and (for instance) type a*.exe to find any program file (EXE) that
starts with an a.
The combination of this search feature and the method to open files (see page
16) also makes it possible to open many different files at the same time, even
if they are located in different folders. Just use the search method described
above, highlight all of the files in the Search Results dialog (press CTRL+A),
and drag them into the application.
There are also many third-party search tools available. One of the more
outstanding ones has to be Everything (see www.voidtools.com). This tool is
an extremely small utility (both in terms of its download size and the files it
creates) that is able to index all names of files and folders on a computer—I
had more than 200,000, and it needed less than three seconds. Once indexed,
the files and folders are all listed in Everything's main application window; you
can look for any part of the name by typing it into a search box (using
wildcards or not) and the results are displayed instantaneously. You can
search your complete hard drive or any folder or folder group in Windows
Explorer—just right-click and select Search Everything.
We all know that the search feature in Windows XP is painfully slow, so there
is obviously not even the chance of comparing Everything to XP, but even
against newer versions of Windows it wins hands-down.
Figure 13: Everything’s main window with a filter for PDF files
Holding the CTRL key while you drag a file or folder to another place within the
Windows/File Explorer makes a copy of the file or folder rather than moving it.
Moving the file or folder within the same folder will make a copy of that file or
folder and rename it to Copy of <OldName>.
If you have several applications open, you can switch between those by
selecting them on the Windows taskbar. In situations where the taskbar is not
visible—for instance, when displaying a PowerPoint presentation—it is easier
to do this by pressing the ALT+TAB key combination:
If you continue to press the ALT+TAB combination, you can rotate through the
open applications. Releasing the keys will open the appropriate program.
Vista (and Windows 7 but not Windows 8/8.1) also offers the 3D Flip,
activated by pressing the +TAB instead of ALT+TAB.
Another feature that was introduced in Windows 7 is the Jump List feature—
the function that allows you to right-click on any icon in the taskbar and select
from a number of options, including, and most helpfully, the most recently
opened instances of documents with that particular program.
This also works with Windows/File Explorer. However, since you typically use
Windows/File Explorer so frequently throughout the day and there is space
only for the last seven visited locations, chances are you won't see the place
you need to go to once a day to, say, make a backup of your current project at
day's end.
The good thing is there is a special feature for Windows/File Explorer: You can
manually bookmark some favorites to the top of the Recent list by pinning
folder locations. Just click on any folder and drag that folder icon to the
Explorer shortcut on the taskbar. You'll see the message Pin to Windows/
File Explorer before you release the mouse button. The folder will now
appear under a Pinned section of the Jump List, and you can remove it by
clicking the Unpin from this list icon on the right side of the panel.
And because it's so much fun to do this in Windows/File Explorer, you can also
do it in a couple of browsers, including Firefox and Internet Explorer, so that
you have web pages pinned down in their Jump Lists, allowing you to open the
pages without opening the browser first.
Most non-Asians who study East Asian languages find it much easier to
remember characters of Chinese origin with the help of (real or imagined)
pictographic aids. The same aid can be used with some well-chosen, fairly
universal keyboard shortcuts.
The easiest to "see" this with is X (as in CTRL+X) for Cut (see the picture of
scissors?). But how about CTRL+V for Paste? Can you see the proofreader’s
classic insert mark in the V? The same concept accounts for the Y in CTRL+Y
for Redo, and CTRL+Z for Undo is a pictographic representation of a scribble-
out.
Most other keyboard shortcuts are rather English-centric (because they are
associated with the English word for the respective action: CTRL+O for Open,
CTRL+N for New . . .); nevertheless, it is extremely helpful to learn this basic
set of shortcuts because they are used across the majority of programs and
languages.
If you have too much time on your hands and would like to refresh your
memory on all kinds of keyboard shortcuts for Windows products, here is
a super-comprehensive list: www.microsoft.com/enable/products/
keyboard.aspx.
WinKey Shortcuts
One often overlooked set of shortcuts is those associated with the WINKEY
() the key that is typically located on the lower right of the Windows
keyboard and displays the Windows logo. Especially gamers don’t like this key
because it tends to interfere with their activities, but I really like it because it
gives access to a number of features that otherwise require the mouse.
• : Open the Start menu (in Windows 8/8.1 it switches to the previous
mode)
• +T: Focus on the first and then succeeding taskbar entries (+Shift+T
cycles backward)
• +SPACE: Peek at the desktop (in Windows 8/8.1: allows you to switch
between the different keyboards you might have installed)
• +NUMBER KEY: Launch a new instance of the application in the Nth slot
on the taskbar
The best way of doing this is by labeling each subfolder within a client’s folder
with year-month-day since this gives you the easiest way to sort. Now, it’s
possible to do this manually, but it’s easier to add the date to the folder name
automatically.
Naming conventions for files—if not prescribed by the client—should also have
a certain logic, and it is generally helpful to have an indication in the file name
of whether a file is an original, translated, or edited file (filename_o.doc vs.
filename_t.doc vs. filename_e.doc). If you would like to batch rename a
great number of files, you can find more information on page 121.
If you have more icons in the taskbar then you would like to have
displayed, there is a helpful way to control their behavior. Select
(Start>) Control Panel> Taskbar (and Start Menu)> Notifications
Area. To the right of that check box, select Customize to set the
behavior of each of the icons that are presently being displayed or have
been displayed in the past.
There are two ways to control which programs are started up.
Any program that is listed under Start> (All) Programs> Startup will be
launched automatically when you start Windows. To delete any association
from that list, you can simply right-click it and select Delete.
On the other hand, if you want to have your email program (or any other
program) started every time you start Windows, you can also add a link
to your Startup folder. To do this, right-click on the EXE file in its
installation directory and select Create Shortcut. Once the shortcut file
has been created, you can drag or copy it into the Startup folder.
In Windows 8/8.1 you can access the Startup folder by pressing +R and entering
shell:startup.
However, simply selecting Delete will not stop all automatic startup programs
from running. To accomplish this, press +R and type msconfig.
All utilities and programs on the Startup tab are started automatically. You
will need some of these programs to start up, but many can be unchecked
(this depends on your computer configuration) to promote a faster startup
and better performance.
You can find several lists on the Internet that will help you make an informed
decision on which of these items should be started up and which not. Two of
these lists are www.pacs-portal.co.uk/startup_search.php and
www.answersthatwork.com/Tasklist_pages/tasklist.htm.
In Windows 8/8.1 you will find a link on the Startup tab to the Task
Manager from which you can administer the programs that are
automatically launched. Also, Windows 8/8.1’s automatically running
Action Center will remind you every once in a while that it might be a
good idea to disable some of those "startup apps."
In the Windows NT line (NT, 2000, XP, Vista, 7, 8/8.1), many programs are
run as so-called services. These are listed on the Services tab of the System
Configuration Utility; this is also where you can stop or start services. For a
better description of each of the services and the ability to decide whether
services should be started manually, automatically, or should be disabled, you
can open the Services dialog under (Start>) Control Panel>
Administrative Tools> Services.
Double-clicking on each of the services will open a dialog in which you can
adjust the settings.
Especially notorious in this respect are "themes" that can include animated
mouse cursors, different sound schemes, and complex desktop or screensaver
graphics. These can be disabled under
Your computer will also need extra resources for the Aero interface in
Windows Vista and 7, but the main resource hog in that interface is the
transparency feature. You can disable this under Start> Control Panel>
Personalization> Window Color and Appearance> Enable
Transparency. Windows 8/8.1 has neither the Aero interface nor the
Transparency feature.
Most Windows users know that software cannot be uninstalled by deleting the
corresponding folder under Program Files. Instead, it must be done through
Start> (Settings>) Control Panel> Add/Remove Programs (pre-Vista)
or (Start>) Control Panel> Programs and Features (Vista and above).
What many do not know is that many uninstallation programs are either not
smart enough to find all the required files and references, or they are not
even supposed to. Whenever you change anything in any of the files that were
originally installed with that software, the file will not be uninstalled (this
includes spell-checking dictionaries, for instance). The only way to uninstall
those files is to actually go to their installation path (usually under
C:\Program Files or C:\ProgramData) and delete them manually in
Windows/File Explorer.
The Registry
Another sore spot in any Windows installation is the registry, a database used
by Windows to store configuration information. The registry consists of
information about your programs, operating system, all associated hardware
and their drivers (little programs that make your hardware perform in the
desired manner), and your personal settings for these programs. You can
access the registry by pressing +R and entering regedit, which will open a
view of the registry that allows you to search for certain keys, values, or
attributes and then edit them.
But be forewarned: this is a very risky undertaking that could literally cripple
your computer, so only do this if you have very clear instructions on what to
look for and edit.
Before you use any of the registry cleaning utilities, make a web search
for that application and see what other people have to say about it.
Although this is not exactly a guarantee for success, it should give you a
better idea of what (or what not) to expect.
And just to make sure, it’s also a good idea to perform a backup of your registry
under File> Export in the Registry Editor.
Disk Cleanup
Once the computer determines which files could be deleted, it will display
them divided by category and let you select which files you would like to
delete.
In the graphic above you can see that one of the items in the list is the
"Recycle Bin." This is a Windows security mechanism by which it assures
that files you delete from the hard drive will only be "truly deleted" once
you empty the Recycle Bin.
With the release of Windows 8.1 (and through a Windows Update patch for
Windows 7 and Windows 8) you can now also delete copies of old Windows
Update files from your hard disk. To do that you will have to select the Clean
up system files button in the Disk Cleanup utility.
Deleting your temporary Internet files will only delete temporary files that
have been collected with the Internet Explorer. It will not delete "cookies."
Cookies are small text files that are stored on your computer by a web server
so that you can be recognized when re-visiting its website.
Figure 26: Setting the amount of disk space for temporary Internet files to 50 MB in IE
Figure 27: Setting the amount of disk space for temporary Internet files to 20 MB in Opera
In Firefox this option is available under Tools (press ALT to see the menu)>
Option> Advanced> Network> Cache.
In Safari you have access to this feature under Edit (press ALT to see the
menu)> Preferences> Advanced.
Google Chrome does not allow you to manage the amount of temporary files.
There are a lot of programs out there to help you find those large
perpetrators, but the one that I find very helpful is ancient by today’s
standards. WinDirStat (see www.windirstat.info) was originally developed for
Linux computers, but it comes in a Windows and Mac version as well, plus it's
open source—ergo free—and you don't have to take a training class to use it.
It has a no-nonsense approach, runs even on the latest operating system, and
once it's done analyzing your computer, which takes just a few minutes, it has
all kinds of ways to show you where those bad space-invaders can be found,
including a very psychedelically colored map.
However, before you defragment, it is usually a good idea to check your hard
drive for any errors with the ScanDisk program. This can be done by right-
clicking on the drive in question (usually the C: drive) in Windows/File
Explorer or My Computer and selecting Properties> Tools> Check (now)
.
Once ScanDisk has successfully finished checking the drive for errors, you can
start the defragmentation. An easy way to access Window’s defragmentation
program is by right-clicking on the drive in question (usually the C: drive) in
Windows Explorer and selecting Properties> Tools> Defragment now
(Windows 8/8.1: Optimize). While Windows Vista and above start the
defragmentation process right away, Windows 2000 and XP give you the
option to analyze whether a defragment is necessary. Depending on the state
of your drive, defragmentation can take several hours and is thus a process
that should be done overnight (unless you share a bedroom with your
computer—the chattering disk will keep you up all night).
Windows Vista and above also offer you the opportunity to schedule regular
defragmentations.
In Safe Mode, the only programs that are loaded are the operating system
and drivers for the mouse, keyboard, and standard display modes, greatly
increasing your chances for successfully loading your computer. Once you are
in Safe Mode you can undo what you messed up before and then reboot into
Normal Mode. And sometimes problems even disappear once you have booted
into Safe Mode.
To enter Safe Mode, continually press the F8 key as your computer starts up
until you see a screen where you can select Safe Mode as your startup option.
Once booted into Safe Mode, you can adjust your settings and simply restart.
Your computer will then automatically boot into Normal Mode.
Select whether you want to restore your computer or create a restore point
and click Next.
If you chose to restore your computer, you can now select the date and the
system change you would like to restore it to. Selecting Next will restart the
computer to that point.
Any programs that have been uninstalled or installed during that time
period will also be reversed. However, documents that you may have
worked on will not be affected by this.
Backing Up Files
Though we had to say farewell to floppy disks as a backup method, we are
once again in the golden age of inexpensive backups. With prices of CD-Rs
and even DVD-Rs only slightly higher than floppy disks, and CD/DVD-RW
drives as standard equipment on most computers, it is very easy and
convenient to make regular backups of the projects you are working on.
Because it often takes a little more time to write very large files to a CD/DVD,
I have found it most convenient to make a daily backup of my current projects
on an external drive, and a backup on a CD/DVD once the project is finished.
I usually store the CDs/DVDs at a location separate from my computer.
The only drawback I have encountered is slow upload times for very
large files, such as translation memories or email folders.
One intuitive way to retrieve files from the Windows 7 backup is a feature
called "shadowing." This tool restores files to a previous version by "looking"
into the backup location as well as the System Restore archives to see
whether there is an older version of the file and allowing you to restore it to
that version. All you need to do to access the feature is right-click on the file
in question and select Restore previous versions.
Once you select the command, several earlier versions of the file might be
displayed:
Now you can select the version of the file you need to restore.
Once you do this, there is no undo: the current version of the file in
question is gone and has been replaced. In almost all cases this will be
fine. In the few cases where this makes things even worse (there is
typically only one previous version per day, so you might not get the
version you want), there is also the option to highlight one of the files
and select Open or Copy to check whether it's the correct version. Once you know it
is, go ahead and save it over your existing file.
The backup system in Windows 7 was easy to set up and highly customizable,
but oddly enough it was used by so few users (Microsoft says about 5%) that
it was deprecated for Windows 8.
To enable File History, open the File History item on the Control Panel, select
the external backup device (which could be a USB stick or an external hard
drive), and select Turn on. Under Advanced settings you can set up how
often you want to run the backup, how much space it’s going to occupy, and
how long you want to keep the backed-up versions.
The only files that will be backed up are the files in your pre-configured
libraries (Documents, Music, Videos, and Pictures) plus content on your
desktop, favorites, and contacts. This means that any files you want to have
backed up that are not contained in the libraries need to be copied into an
existing or new library (see page 14 on how to view and create libraries).
To access a previous version of a file or a folder, you will now have to select it
in File Explorer and select the History button on the Home ribbon.
You can then compare the different versions and keep any or all of them (they
will be distinguished by numbers added to their names).
A well-known and highly rated program for that purpose is Acronis True
Image (see www.acronis.com). This program is still used widely despite the
fact that the latest versions of Windows offer some comparable backup
options (albeit much slower and harder to use).
There are two options offered in Windows 8/8.1: to completely reset your PC
and delete all data and installed programs in the process or to refresh your
computer without affecting your files. This second choice keeps your personal
data, system settings, and Metro style applications. Desktop applications will
be kept as well if you have previously created a custom image (see below).
To access these possibilities, press +C to open the Charms sidebar, select
Settings, and then select PC Settings. You will find the following options
under Recovery:
To create a custom image so that you can reset your complete PC, including
all the desktop applications, you’ll first need to start a Command Prompt as an
administrator. There are a number of ways of doing this, but the easiest is by
opening the dreaded Metro Start screen and typing cmd. Once Command
Prompt is displayed, right-click on it and select Run as Administrator.
The creation of the image might take several hours, and once it’s done you
can copy it on an external drive or disk and will have no problem refreshing
your PC to the pristine state that you had it in once before.
Unless you take your computer into a computer shop when you encounter
problems, it’s sometimes very hard to explain what went wrong to someone
on a phone help line. Of course there are ways to share your computer with
someone else—the easiest might be join.me—but another way is to record
your problems and send the recording to someone else.
There are a great number of third-party products that do this along with an in-
house tool Windows 7 and 8, the Problem Steps Recorder (in Windows 8/8.1:
Steps Recorder). This tool allows you to record everything on your screen
(with the exception of text that you enter). Once the recording is done, it is
not saved as a movie file but as an MHT HTML archive file and zipped up. Once
unzipped, the MHT file can be opened with either Internet Explorer, Chrome,
or Opera (or with Firefox or Safari with a special plug-in). It gives you a
screen-by-screen description of what just happened on your computer as well
as a narration of the process and operating-specific information.
To start the recorder in Windows 7, click on the Windows button, type psr,
and hit Enter.
In Windows 8, type steps in the Start screen and select the Steps Recorder.
Keyboard Languages
It may sound strange in this age of unlimited choice, but there are times when
it would be helpful if computers gave fewer choices for how to accomplish a
certain task. (Needless to say, there are other times when just the opposite
would be true!) One area where there are far too many choices is entering
non-English characters in a Windows environment or within a tool like
Microsoft Word.
Here are some of the choices for entering non-English characters with the
facilities that Windows and/or Word offer:
• The archaic way: The Character Map. You can either start this under
Start> (All) Programs> Accessories> System Tools> Character
Map, or through a slightly modified version within Word under Insert>
Symbol (> More Symbols). Here you can find all the supported symbols
and characters for each individual font to select and paste into your text.
This is a great choice for the casual non-English user, but certainly not for
the professional translator.
• The Word-centric way, part II: Customized shortcuts within Word. You can
select a character in the Word Character Map (see above), click Shortcut
Key, press the key combination you want to use (i.e., an ALT+ combination
or a function key), and then click Assign. Not good either. Though you can
get by with just one keystroke combination, you’re still lost outside of
Word or on any computer other than your own.
• The work-out way, aka the ASCII code: This poor but unbelievably popular
way among translators consists or four (4!) keystrokes for one character.
To activate this, make sure that you have your NUM LOCK key enabled (the
small keypad on the right of your keyboard), and type the number of that
character on the small keypad as you press the ALT key. The above-
mentioned "å" has the key combination 0228. Phew! Like I said, a great
way to train your memory to remember all kinds of code and exercise your
finger muscles, but this certainly is not conducive to a productive work
environment!
Clearly, things can’t be as bad as these methods suggest, and most of you
know that the best way by far for dealing with special international characters
is by installing a language-specific or the US-International keyboard.
First things first, though. For the uninitiated, there is a distinction between a
virtual and a physical keyboard. The physical keyboard is the hardware
keyboard that you use to type and on which every key is labeled with a certain
letter, number, or symbol. If you bought your computer in the U.S., chances
are that you have a US-English QWERTY keyboard (representing the first six
proper letters). If you bought your computer and/or keyboard in—let’s say—
Germany, you will probably have a German QWERTZ keyboard. The funny
thing is that the labels are only meaningful if that physical keyboard matches
the "virtual keyboard"—i.e., the way that your computer assigns the physical
keys to the actual output on your screen. If they don’t match, the virtual
keyboard decides the output.
You are free to select as many virtual keyboards as your heart desires (if they
are among the more than 100 different keyboards plus various other input
systems supported by Windows), and in fact for many languages there is a
good selection to choose from. For instance, one of the keyboards for U.S.
English is the US-International keyboard, which is particularly interesting in
our context because it provides ready access to a number of important
international characters if you press the right ALT key.
Figure 46: Characters on the US-International keyboard when pressing the right ALT key
You can find the On-Screen Keyboard of the image above under Start>
Programs> Accessories> Accessibility> On-Screen Keyboard
(Windows Vista and 7: Start> Programs> Accessories> Ease of
Access> On-Screen Keyboard; Windows 8/8.1: Type on-screen in
the Start screen and select the On-Screen Keyboard.)
Starting from Windows XP SP2, there is also a British equivalent, the "United
Kingdom Extended" keyboard. This keyboard particularly supports languages like
Welsh, replaces the apostrophe key as a dead key with the grave accent key, and
introduces some other changes to the US-International keyboard.
Aside from the keys that can be accessed like this, you can also "create"
international characters with a combination of a "diacritical mark" (a so-called
"dead key") followed by a letter:
• "+a=ä
• '+a=á
• '+c=ç
• `+a=à
• ^+a=â
• ~+n=ñ
All this is great, but it also causes what many users consider to be the
drawback of the US-International keyboard: the characters ", ', `, ^, and ~
are "dead keys," which means that they don't "type" if you use them in a
normal text. Only when you type the next character will the system "know"
whether you meant the character as a diacritical mark or a real character and
output either one or two characters. If you are not used to this so-called
"sequence checking" process, it can feel quite disconcerting, and, worse,
some Windows installations tend to behave irregularly with printing or not
printing the "dead keys."
Select Add and define which additional languages and/or keyboards you
would like to have installed on your system.
When you select OK, the new keyboard will show up in the list of installed
keyboards.
After you leave this dialog, you will have a little language icon displayed on
your task bar.
This icon displays your currently selected languages and allows you to switch
between the different keyboards. Should you have more than one keyboard
for one language installed (for instance, both the US and the US-International
keyboards for English), a little keyboard is displayed to the right of the
language icon. Clicking on that keyboard will allow you to select the specific
keyboard you need.
If you cannot see the keyboard, right-click on the language icon and
select Additional icons in task bar. The same right-click command
also gives you access to the Restore (or Show) the Language bar
command that places a full language bar on the top of your screen, or
the Settings command which displays the configuration dialog for the
installation of a new keyboard without having to go through the ridiculous paths
described above.
This is all very easy. It becomes a little more hairy if you have to select
languages that either don’t deal (exclusively) with alphabets (such as
Japanese, Chinese, or Korean) or use a completely different writing system,
such as bi-directional languages (Hebrew, Arabic).
If you are still using Windows 9x or NT, you will either need a localized version
of Windows in those languages or an additional program on top of Windows
that will allow you to write. From Windows ME/2000 on, these languages are
supplied with the operating system if the appropriate "locales" are enabled. In
Once your locales are enabled, you can go back to the keyboard dialog and
add keyboards (or: "Input Method Editors") for those languages—just as with
any other keyboard under Regional Options.
Because many of the more complex writing systems offer a variety of options
for their input systems, it is important to remember to activate Additional
icons in task bar as described above. If you do not do that, you will not be
able to use the keyboards properly.
Figure 51: Example of Japanese, Korean and Simplified Chinese keyboards with access to
various features
In Windows 8/8.1, keyboards are part of the language concept that also gives
you access to the multiple language user interface. To select additional
keyboards, you’ll need to select Control Panel> Language> Add a
language. This will add the standard keyboard for that language. If you need
to add several keyboards for one language or you prefer another than the
standard one, you can click on the Options link to the right of the language
and select Add an input method in the ensuing dialog.
Figure 52: Selecting Options for additional language-specific keyboards in Windows 8/8.1
Much like in previous versions you will see a keyboard icon in the system tray
once you have more then one keyboard installed. (Left-) clicking on it will
show the installed keyboards and allow you to choose a keyboard
.
You can also switch between keyboards with key combination +Space
or Alt+Shift.
Aside from the options that Windows offers you in the standard installation,
there are many things that can be said about ways to change the mapping of
your keyboard so that it works in one specific language or several, or
performs certain processes ("macros") when pressing certain keys. A very
powerful program which allows you to reassign keys is the Microsoft Keyboard
Layout Creator or MSKLC (see www.microsoft.com/globaldev/tools/
msklc.mspx). This allows you to take an existing language-specific keyboard,
change some settings, and save that new keyboard as a customized keyboard
for your language. Or you can create a new keyboard from scratch.
There are some drawbacks. MSKLC works only on Windows 2000 and above
and it’s not particularly easy to use. Once you load an existing keyboard, you
need to first make modifications and save the resulting file (those commands
are all available in the File menu); only then can you build a project that will
result in an installation program for the new keyboard (you can access those
commands from the Project menu).
The great news is that it is free and the documentation is really pretty good. I
used this program to swap the Y and Z keys on my German keyboard so
they’re in the same order as the English keyboard and I can avoid all those
sillz tzpos.
One last thing about keyboards: There are very few things that I hate as
much as when I hit the Caps Lock or the Insert key without knowing it and
the following text is either in all caps or overwritten.
Fortunately, Windows allows you to have a little beep sound go off every time
you hit the Caps Lock, Num Lock, or Scroll Lock keys. To activate this
feature, select Start> (Settings>) Control Panel> Accessibility Options
(Vista and above: (Start>) Control Panel> Ease of Access Center> Make
the keyboard easier to use) and check Use Toggle Keys.
To make the Insert key beep every time you hit it, you can download the free
and tiny Insert ToggleKey utility at www.mlin.net/misc.shtml.
Web Browsers
Web browsers—those programs that help you locate and display web pages—
are another of the rather emotional topics where everyone feels very strongly
about the browser that he or she uses (especially if it is not Internet
Explorer). I used to use Internet Explorer because it was one of the first to
auto-detect different character sets and download the necessary fonts for it.
However, since Firefox (see mozilla.com) now also does a good job with this,
in addition to offering a huge number of add-ons to make it do virtually
anything, I use Firefox.
Figure 56: Mozilla Firefox with the Quick Locale Switcher add-on
The Firefox browser is also the only browser that allows me to have
Jeromobot watch over everything I do. You can download the Jeromobot
persona (skin) from the screenshot above under www.getpersonas.com/
en-US/persona/242788.
I also have Opera, Safari, and Google Chrome on my computer for testing my
website and others.
Browsing Tips
When I started translating professionally, the Internet was already a
formidable resource that held a lot of translation-related information. But I
know the feeling of rummaging through books and other "hardware" to find
answers that I just couldn’t find anywhere else.
While this will always remain so to a certain degree, here are some tricks that
should make your Internet searches on Google and Bing just a little more
focused.
Most everyone knows the use of quotation marks to find "just that specific
expression," the + sign to force the search engine to include the following
word in the search, or the - sign to specifically exclude sites that contain the
succeeding word.
• If you want to search for pages that Google has in its cache (previous
storage), so that cache:internationalwriters.com tool kit finds
pages that have been changed or deleted (Google only).
If these tricks have not really impressed you, the next ones will:
• If you want to look for something in only a certain kind of document (such
as a PDF file) and not in any other, type filetype:pdf "translation
memory". The result will be all PDFs that are registered with the search
engines and contain the phrase "translation memory." If you would like to
specifically exclude PDFs, you can type -filetype:pdf "translation
memory" (both Google and Bing).
• If you want to return webpages for a specific language, you will just need
to specify the language code directly after the keyword language:. For
example, if you are searching for your name on Chinese-language
websites, you will need to enter John Doe language:zh (Bing only).
• To return webpages from a specific country or region, you can specify the
country or region code directly after the keyword loc: (or location:).
You can even combine this with an OR search. For esample, to see
webpages about machine translation from the U.S. or Great Britain, enter
"machine translation" (loc:US OR loc:GB) (Bing only).
• Or how about this one: Unless you have one favorite online dictionary you
always go to when you need a definition, you can also type
define:translation (Google only).
For instance, look at this URL from the Microsoft help site:
http://windows.microsoft.com/en-us/windows/create-user-account#create-
user-account=windows-8
This URL has two language identifiers. One is very obvious—en-US (in this
case a mixture of the standards ISO 639-1 and ISO 3166). The other identifier
may not be as obvious: the last four digits at the end are the widely used
Microsoft Locale ID (see msdn.microsoft.com/en-us/goglobal/
bb964664.aspx).
To change that page into, say, Japanese, you could just manually replace the
URL in those two places with the appropriate codes:
http://windows.microsoft.com/ja-jp/windows/create-user-account#create-
user-account=windows-8
and come out with the Japanese counterpart with all the Japanese
terminology at your fingertips.
Or here is another one. There is a great English and German parallel SAP
glossary at http://help.sap.com/saphelp_glossary/en/index.htm. To change
the language in that case all you need to do is replace the "en" with "de".
But since this is a glossary within an HTML frame, it's not quite as easy to get
to specific entries. If you click on any of the actual English entries in the above
page, the URL does not seem to change. However, Firefox offers an easy way
to let you open the page within the frame as a standalone page. Once you
click on an entry and you have the English term and description displayed,
right-click on that page and select This Frame> Open Frame in New Tab.
This might open:
http://help.sap.com/saphelp_glossary/en/3b/
57a67b78608045852d629395c6844b/content.htm
And sure enough, just by changing it to:
http://help.sap.com/saphelp_glossary/de/3b/
57a67b78608045852d629395c6844b/content.htm
we get to the translated page.
While this particular example is only good for those who work in that language
combination, there are many other cases where this can be adjusted easily to
other websites and language combinations.
There are also tools that support a more in-depth comparison of different
language versions so that you can quickly not only spot the top-level term but
some of the terminology that surrounds it. Manypedia (see
www.manypedia.com) is a tool that searches Wikipedia for a specific term and
then looks up the corresponding Wikipedia pages in other languages. It will
then tell you the percentage of the similarity of the concepts and display the
pages you request side-by-side.
File Transfer
If you send files by email, it’s almost always a good idea to zip the files. Aside
from reducing the upload and download time because of smaller file size,
zipping adds an extra layer of protection to your files, does not write-protect
your files, sends one file instead of many, and bypasses many virus protection
applications that would otherwise block access to files with certain extensions.
For large files, you should not send attachments by email, but via the File
Transfer Protocol (FTP), the same protocol that is used to upload files to
websites. Although many browsers—especially Internet Explorer—have a
decent or good FTP integration, I would always advise having an FTP program
("FTP client") aside from the main browser. The version that I use the open-
source FileZilla (see www.filezilla-project.org).
Another, now widely accepted way of sending very large files is through (free)
cloud-based services like Hightail (formerly YouSendIt) (see
www.hightail.com). Note that if you directly work with large corporate clients,
their corporate network policies might not allow access for their employees.
There is no reason to become paranoid. On the other hand, it’s helpful to have
an idea of what’s out there so you can adjust your behavior and choose your
defense mechanisms. After all, most—if not all—of us deal with sensitive data
as translators.
Aside from the financial consequences that might result from unintended
disclosures of the data you were entrusted with, a slip-up like that could also
cause significant harm to your reputation.
I will begin by describing the most common threats to our computers and
data, followed by a description of various tools that will let you reduce the risk
posed by these threats. Finally, I’ll include a few last words of advice on this
subject.
Common Threats
This section lays the groundwork by introducing some of the terminology you
might encounter when reading about computer threats.
Malware
Malware (Malicious software) is software that was designed to harm or enter
a computer system without its owner’s informed consent. The term refers to a
variety of forms of software or program code that are hostile, intrusive, or
annoying.
Virus
Computer viruses have been around since the dawn of personal computers, so
it’s likely that you’ve already made a more than personal acquaintance with
one or more of them.
A computer virus is a program that can copy itself and infect your computer
without your permission or knowledge. Until a few years ago, the most
common way for a computer virus to spread was by removable media, such as
a floppy disk, CD, or a USB drive. With the increased use of the Internet, e-
mail, cloud-based services and file sharing, these have become common
vehicles for attacks, too.
Worm
A computer worm can spread itself to other computers without needing a host
for the transfer.
The havoc that a worm can wreak is limited only by the author’s imagination.
The more common attacks focus on creating backdoors on computers or
turning computers into "zombies" (see page 81). Often these "zombies" are
combined into systems called "botnets" (see page 82).
Trojan Horse
Quite often, social engineering techniques (see page 80) are used to lure you
into opening or executing these files and programs.
Spyware
Not only can spyware collect all kinds of information about you and your web-
surfing habits, but it may also redirect your web browser activity without your
knowledge.
Backdoor
Keylogger
Adware
Attacks
In addition to the malware mentioned above, there are also a few other types
of attacks that may affect us.
Here’s a brief description of each of these to help you understand them better.
Phishing
Phishing is an attempt to acquire sensitive information—such as usernames,
passwords, account information, or social security numbers—by masquerading
as a trusted entity via electronic communications. To achieve this goal,
phishing often employs "social engineering techniques" in an effort to fool
users.
This screenshot nicely illustrates one such phishing technique. At first glance
this might look like an e-mail from eBay, but on closer inspection it becomes
clear that this is an imposter’s phishing e-mail. The first giveaway is that there
is no specific recipient listed. More importantly, when you hover your cursor
over the alleged eBay URL, a different URL is revealed as a tool tip.
Another form of phishing is advance-fee fraud, AKA "Nigerian scams." Any one
of you will have received emails that promise great riches in exchange for an
upfront payment, but recently there have been some of those emails in
distribution that were specifically aimed at translators. Ted Wozniak, the
owner of the translator payment watch list paymentpractices.com, has taken
it upon himself to compile a list of those for reference purposes at
www.paymentpractices.net/Scams.aspx.
Drive-By Download
Sometimes also called a "drive-by installation," this term refers to a download
and installation that occurs without your knowledge, and thereby without your
consent, rather than just the mere download of some type of malware.
Such drive-by downloads can happen when you visit a website, view an e-mail
message, or click on a pop-up window. While some drive-by downloads
require a very limited amount of user interactions, such as a mouse click,
others may exploit a vulnerability in the operating system or in an application,
such as your e-mail client or your web browser.
Denial-of-Service (DoS)
A denial-of-service attack (DoS attack) attempts to render a computer
resource—such as an Internet site or a service—unavailable to its users.
Zombie
A zombie computer is a computer that is connected to the Internet and has
been compromised by a virus, a Trojan horse, or a hacker to make it
accessible to the people who "own" the compromised systems. The actual
owner of the computer tends to be unaware that his or her system is being
used to send e-mail spam (see page 83), commit click fraud, conduct a
denial-of-service attack (see page 81), or for other nefarious purposes.
Typically, a compromised machine is just one of many under remote direction
in a botnet.
Botnet
A botnet is a collection of software robots, or bots running automatically and
autonomously on zombie computers remotely controlled by crackers (criminal
hackers) via a common command and control infrastructure.
Man-in-the-Middle (MITM)
A man-in-the-middle attack allows the attacker to read, insert, and modify
messages between two parties without either party knowing that the link
between them has been compromised.
The attacker may simply be eavesdropping to obtain victims’ details during a
phishing attack, or he or she might jam all communications to one party
(denial-of-service) or use information gathered to execute a replay attack.
Nuisances
In comparison to the malware and attacks that specifically target your
computer, the following issues are mere nuisances, though they can be quite
aggravating if you don’t have the right tools at hand to repair their damage.
Pop-up Ads
Pop-under ads are a variation of this technique that opens a new window
underneath the active window rather than on top of it. Typically you don’t see
them until you close your current browser window, so it becomes a lot more
difficult to determine which website originally opened this pop-under window
with the ad in it.
Spam
The terms e-mail spam, bulk e-mail, junk e-mail, UCE (unsolicited commercial
e-mail), and UBE (unsolicited bulk e-mail) all refer to nearly identical
messages sent to a large number of recipients via e-mail.
Some websites try to track users’ browsing habits by using cookies, small
chunks of text sent to the browser by the server and then sent back
unchanged upon each subsequent access to that server.
There are justified uses of cookies that are helpful to the user, such as
authentication, configuration of site preferences, and electronic shopping
carts. However, so-called tracking cookies, such as those third-party cookies
catered by DoubleClick, are frowned upon, and more and more users have
privacy concerns about the tracking of their browsing behavior.
Cookies are only data, not program code; therefore, they cannot delete
information or read information from your computer. They do not generate
pop-ups, nor are they used for spamming or for advertising.
It is also quite common for website operators and advertisers to embed tiny
transparent (or colored) GIF images (typically a single pixel in size) on web
pages as a means of tracking who accesses each page how often. These little
critters are also referred to as web bugs.
Hardware
Router
Virtually all of the cable modem / DSL routers currently available block
unsolicited traffic arriving from the Internet. Any "legitimate" network traffic
from the Internet sent in response to a request from your computer—e.g.,
retrieving e-mail, browsing web pages, or downloading files—is let through
and can get to your computer.
An additional benefit of these devices is that they are typically also equipped
with four network ports, allowing you to set up a small network of computers
at your workspace that can all share your high-speed connection.
I have been using CISCO Linksys routers for several years now and am quite
happy with their performance and reliability as well as ease of administration.
In the past few years, CISCO Linksys (see www.linksys.com) and other router
manufacturers such as Netgear (see www.netgear.com) and D-Link (see
www.dlink.com) have streamlined the initial setup to pretty much be plug-
and-play. Just follow their install instructions and you’ll be fine.
On the Authentication screen, you will enter a password rather than a user
name. If you changed the password, use whichever password you assigned to
the router; if the router still has its default password, it would be admin.
If you opted to get a wireless router, make sure that it is configured to use
encryption. If at all possible, elect not to use Wireless B (802.11b), since this
rather dated technology only allows for the Wireless Equivalency Protocol
(WEP) for encryption, which can be easily hacked into with freely available
applications.
You can configure the network mode of your CISCO Linksys router under
Wireless> Basic Wireless Settings.
Once again, don’t forget to click on Save Settings when you are done with
this step.
Now that you’ve restricted the type of networks you want your router to
support, set up your wireless security to use WPA Pre-Shared Key so that
only users who know the string you are using as your pre-shared key will be
able to connect to your network.
You can configure the security mode of your CISCO Linksys router and enter
the passphrase under Wireless> Wireless Security.
As you can see from the above screenshots, there are a wide variety of
additional features and settings available for those of us who want to venture
into this terrain, but the steps outlined above are sufficient to provide a
reasonable level of protection.
If somebody were to steal your computer or even just your hard disk, they will
be able access the data on the disk unless it is encrypted. Several vendors
such as Seagate (see www.seagate.com) have now started to offer hard disks
with built-in encryption.
Windows (XP and above in the Professional/Ultimate versions) allows you also
to encrypt individual files without any kind of fuss.
If there is a lot of data in that file or folder it may take some time for the
encryption process to complete, but you'll know it's finished when the
encrypted folders and files appear in a different color than your other files.
And that's really all you will notice—opening, saving, or even emailing them
will all work as before (they will simply lose their protection once you attach
them to an email). Don't believe that they're protected? Try to log on with a
different user account and you won't be able to open them (that's also why it's
not possible to encrypt commonly shared files such as system files).
Privacy Filters
Privacy filters are thin sheets of plastic that are about 1 mm thick and can be
mounted in front of the display, darkening the screen to anybody who is not
sitting at close to a 90-degree angle in front of the screen. This can be helpful
when you’re working on confidential information in a public place, or if you
simply don’t want someone staring at your screen while you work.
Cable Lock
Cable locks are available from a variety of vendors. They are primarily used
for laptop computers, hooking into their so-called Kensington Security Slot (K-
Slot) and allowing you to loop the cable around an immobile object, such as
part of a desk, before securing it in the K-Slot. This way you don’t have to
worry about leaving your laptop at the hotel while you are shopping, meeting
with friends, or having a drink.
Software
To help address the various risks outlined at the beginning of this chapter, a
wide variety of software is available to combat those risks.
There are a number of separate products that target each of these threats
individually in a highly specialized fashion, but it’s easier for most of us to deal
with the bulk of these risks in an all-in-one fashion rather than with half a
dozen or more separate products.
Another reason for choosing an all-in-one solution is that by now, virtually all
providers of "free" security products stipulate that their "free" product is only
free for personal, non-commercial use. Because we earn a living with the help
of our computers, it may not be the most ethical thing to violate these
licensing agreements by operating our business computers with a product that
was only licensed for personal use.
And once you have to pay for a personal firewall, antivirus, anti-spam, or anti-
phishing software, you might as well shell out a few dollars more and obtain
one of the various "Internet Security" suites available from various
companies.
First, I would like to spend a little time to briefly describe the type of products
and the components of the suites that are available for risk reduction.
Firewall Software
Although you (hopefully) already have a router in place (see page 84) to
protect you from attempts to access your computer from the Internet, this
protection is not available while you are on the road, nor does a router protect
you from an infected computer located on your local area network (LAN)—for
Also, some companies have elected to offer only complete Internet security
suites which contain a firewall component without also offering the firewall as
a separate product. Among these contenders are:
Antivirus Software
Antivirus software attempts to identify, fend off, and remove computer viruses
and some other malware.
• Regularly scheduled scans of all the files on your computer’s hard disk(s)
look for files containing the "signatures" that might indicate an infection.
In recent years, Norton Antivirus and McAfee Antivirus have started to offer
versions of their antivirus products which allow for installation on up to three
computers. Among the other contenders for protection against computer
viruses are Panda (see www.pandasecurity.com), Kaspersky (see
www.kaspersky.com), and AVG (see www.avg.com).
Anti-Spyware
Anti-Phishing
Anti-phishing measures can be found embedded in most recent versions of
the popular browsers, such as Internet Explorer, Opera, Google Chrome and
Mozilla Firefox, and also as extensions and toolbars for browsers.
In Internet Explorer 7 (no such thing in earlier versions of IE), the phishing
filter is accessible under Tools> Phishing Filter, in IE 8 and above under
Safety> (Turn on) SmartScreenFilter.
In Mozilla Firefox, you can find the phishing filter settings under Tools (press
ALT to see the menu)> Options> Security.
In Opera, this feature is called "Fraud and Malware Protection" and can be
found under Opera> Settings> Preferences> Advanced> Security.
Pop-Up Blockers
While there used to be a definite need for dedicated pop-up blockers to curb
the flood of pesky pop-up or pop-under ads, basic pop-up blocking
functionality these days can be found in all the major web browsers. Some
browsers, such as Google Chrome, even have pop-up protection enabled by
default.
In Internet Explorer 7 and above, you can exercise limited control over the
pop-up blocker settings under Tools> Internet Options> Security> Pop-
up Blocker Settings (IE 10 and above: Tools> Pop-up Blocker).
In Mozilla Firefox, you can activate or deactivate pop-up blocking under Tools
(press ALT to see the menu)> Options> Content.
In Opera, you can exercise a certain amount of control over pop-ups under
Opera> Settings> Preferences> General.
Anti-Spam
To curb the huge amounts of unsolicited commercial e-mail flooding your
inbox, a variety of techniques and approaches are available.
One of the easiest ways of filtering out spam early on is to activate the spam
filter(s) offered by your Internet Service Provider (ISP) or your mail service
provider (MSP).
The catch with this approach is that you will regularly have to check your ISP’s
or MSP’s filtered-out mail to make sure that your customers’ e-mail or
messages from potential new clients have not accidentally been misclassified
as spam.
If you want to go the extra mile, consider whole-disk encryption or one of the
hard disks with built-in encryption options.
Every version of Windows comes with two text editors: Notepad and WordPad.
Figure 75: WordPad in Windows 7 with an open Word 2010 DOCX document
There are, however, much more powerful (and fairly inexpensive) programs
out there. Most of these have been developed by developers for developers,
so we as translators usually only see the very surface of what these programs
can do, but this is usually enough to be duly impressed. The most commonly
known ones for Windows are probably UltraEdit (see www.ultraedit.com),
EmEditor (see www.emeditor.com), or the free Notepad++ (see notepad-
plus-plus.org). I use UltraEdit for all Western languages and EmEdit for East-
Asian languages. But regardless of what I use, all of these programs do
wonderful things and are typically very sufficient for what we need as
translators.
Have you ever tried to open a 20- or 30-MB text file in Word? Depending on
the speed of your computer, this can take up to a few minutes as Word tries to
load the whole thing all at once. In contrast, these little text editor programs
can open any size text file in just a few seconds. For instance, many of you
have worked with the so-called Microsoft glossaries. Searching through these
glossaries with Excel or Word renders them practically unusable because they
are so large and response time is much too slow.
In the example below, a search for directory structure in all of the old
German "Microsoft glossaries" (a total of 61 files and 98 MB) took UltraEdit
less than ten seconds and resulted in a list of all the occurrences in all the files
and simultaneous links to the exact occurrence.
Any of the searches can be made with wildcards. For wildcards, see
page 18.
If file comparisons are very important to you, you can also use a
specialized tool such as ExamDiff (see www.prestosoft.com/examdiff) or
Araxis Merge (see www.araxis.com). These give you additional reporting
options as well as the ability to compare directories.
These text editors are also able to automatically recognize typical formats
such as HTML or XML and show them in color coding to ease the editing
process.
Figure 78: Color-coded HTML file in UltraEdit with automatically opened HTML tools at the
bottom of the window
One important issue is the change of code pages (between different forms of
Unicode, ASCII, DOS, or Mac formats under File> Conversions in UltraEdit
and between language-specific code pages under File> Save As in
EmEditor):
Here is another use of text editors (though you can admittedly use Notepad
for this also):
Most of us have received files of indeterminate type—we either don’t know the
extension (and even sites like www.filext.com can’t help) or the extension is
gone or corrupt.
Here is a quick and dirty way to help with that. You can open the file in
question in a text editor. If everything is "humanly" readable, the file is in a
text-based format and can be translated in an appropriate text, HTML, or XML
editor. If the file opens with a lot of strange characters, it is some kind of
binary file (a file that can be read only by computers) that cannot be edited
(or saved!) in a text editor.
The good thing is that many of these files have a "magic number" (that’s what
it’s really called!), i.e., a clue to their identity in the first line (the "header
line"). Here is a list of the more common ones:
• TIF/TIFF graphic files begin with either "II" (II for Intel, or little-endian) or
"MM" (MM for Motorola, or big-endian).
• Many EXE or DLL files start with "MZ" or "ZM" (after the developer Mark
Zbikowski).
• MIDI music files have "MThd".
• ZIP files begin with "PK" (for Phil Katz, author of the compression utility
PKZIP).
Figure 81: A "cracked" zip file with the magic number "PK" in EmEditor
This last magic number is particularly helpful because there are many file
formats that pretend to be something very fancy and unique when in reality
they are "only" ZIP files with a new and different extension. Finding out that
these are ZIP files allows you to change the extension to ZIP and unzip them
with the compression utility of your choice. This in turn can sometimes give
you access to data you would not have access to because you might not have
the appropriate application.
I could write a lot more about the benefits of these programs, but here’s what
I’ve found to be true: If I can imagine some kind of logical operation within a
text-based file, chances are that it can be done with one of these programs.
HTML Editors
HTML is an abbreviation for HyperText Markup Language, the authoring
language used to create documents on the World Wide Web. HTML
defines the structure and layout of a web document by using a variety of
tags and attributes. Tags and attributes are enclosed with < and>.
Translatable text typically includes all text between tags (the part that is
displayed by the browser) as well as some attributes within tags (for instance, the
"alt" text that pops up when you move your mouse over a graphic).
Although the afore-mentioned text editors have HTML capabilities (see page
106), many users prefer to use specialized HTML editors for HTML files. Again,
there are many different approaches out there, from the high-powered
flagship products such as Microsoft Expression (see www.microsoft.com/
Expression) or Adobe Dreamweaver (see www.adobe.com/products/
dreamweaver) to (also high-powered but much more affordable) hands-on
products like (the free and powerful) Nvu (see net2.com/nvu/).
Figure 82: HTML page after being saved in non-obtrusive editor (same as original)
Figure 83: Same HTML page after being saved in FrontPage without Preserve existing HTML
selected
Never work in HTML files in Word unless you are specifically instructed to do
so (see image below).
To be fair, Word 2003 and above has an option under File> Save As
that is called Web Page, Filtered. Though this eliminates some of the
additional coding, Word remains an unfortunate choice for an HTML
editor.
If you must use an HTML editor but are insecure about what kind you should
use, a good guideline is to either use non-intrusive programs such as the ones
mentioned above, or the same program that the webpages were originally
designed in.
For quoting purposes, however, it could be a good idea to save some files from
the Internet (if you keep the above limitations in mind).
It is easy to save single web pages. In Internet Explorer and Firefox, open the
web page, select File> Save (Page) as, and make sure that Webpage,
complete or Webpage, HTML only is selected under Save as Type.
Typically, you do not want to quote on a single web page but on a complete
website instead. So-called spiders such as Tenmax Teleport Pro (see
tenmax.com/teleport) allow you to download a complete (static) website at
one time so that you can browse it offline or analyze it for quoting or other
purposes.
Utilities
Before discussing the "real (and expensive!) programs" such as office suites,
computer-assisted translation tools, desktop publishing, or graphics
applications, here is an overview of smaller, inexpensive (or even free)
programs—called utilities—some of which have very powerful capabilities that
can make many things easier for you. Though there are many thousands of
these little applications available, and I am sure that there are many that are
just as useful as—or even more useful than—the ones I am describing here, I
have limited myself to the ones that I use on an almost daily basis in my work
as a translator.
Vista abandoned the Filmstrip view and replaced it with the Windows Photo
Gallery view that displays the pictures full-screen. Microsoft must have agreed
with its users and realized that this was not a move toward the better,
because in Windows 7 and 8 it now offers the downloadable Photo Gallery, an
enhanced version of XP’s Filmstrip. You can also download this utility for other
versions of Windows under windows.microsoft.com/en-US/windows-live/
essentials-home.
XnView is particularly useful when dealing with a large number of image files
such as you might have in a manual or help system. It quickly lets you view
the individual images, decide which images need to be translated or
generated again, and open these images in the graphic editor of your choice.
Renaming Utilities
A small but useful tool is the freeware Rname-it (see www.brothersoft.com/
rname-it-4690.html) that allows you to batch rename a large number of files.
This can be helpful when you need to change an extension (for instance, from
.HTM to .HTML) or when you need to change the actual file name of a large
number of files (for instance filename.doc to filename_edited.doc).
Another useful function of this utility is the ability to change the time
and date stamp of any file. This comes in particularly handy if you have
worked until 5 am and prefer your project manager not to see that . . . .
Search Utilities
While the text editors described above have powerful search and replace
functions, some utilities specialize in that and offer some additional features
that can prove to be very handy.
In the resulting dialog, you can select a mask (i.e., a filter) for your file type
(for instance, *.txt for all text files, or a*.* for all files starting with a) and a
search string, as well as the string to replace it with.
When Search and Replace has shown you how many of the desired strings you
have in your file(s), you can either decide to batch replace or on a case-by-
case basis.
Search and text retrieval programs that approach searches differently are
called indexing tools, such as dtSearch (www.dtsearch.com) or Archivarius
3000 (www.likasoft.com/document-search).
Unlike Search and Replace, they don’t perform any searches in the actual files
but rather in indexes that are linked to the files. Admittedly, this sounds kind
of confusing, but the principle is this: If you have a large amount of data (let’s
say all your email, including attachments, of the last three years), this may be
sorted by name or date but not by the actual data. So for any program to find
a certain word or phrase within these humongously large files, it would
actually have to go through every line of data that is contained in these files.
If, however, you had a preconfigured index containing information on all the
words contained in these files, together with information on where to find
them, these programs could access that information virtually instantaneously.
Starting with Windows Vista, one of the operating system’s main emphases
has been accessibility and ease of search, so it has integrated a new search
mechanism. If files are located inside one of the "indexed locations" on your
computer, it’s just a matter of seconds to find text within the file or the file
name itself. You can change the settings in Windows Vista and above under
(Start>) Control Panel> Indexing Options.
A search tool for translation-related things outside and on your own computer
is the wonderful IntelliWebSearch (see www.intelliwebsearch.com). This free
little application copies highlighted text from any Windows program with a
number of user-definable shortcut keys, opens your default browser or
dictionary, and sends the copied text to up to 50 customizable search engines,
on-line dictionaries, or dictionaries stored on your hard drive.
It takes a little fiddling to find the right coding for the various search options,
but the tool comes with a good number of preloaded searches and with a little
bit of patience and the help provided under www.intelliwebsearch.com/gb/
help.html it’s quite easy to develop your own searches.
Figure 93: IntelliWebSearch with customized search for EN>DE in the IATE
Another possibly even more powerful approach for searching within any
Windows application is offered by the makers of Linguee (see linguee.com), a
very large corpus of English <> German, Spanish, French, Portuguese,
Chinese, Italian, Japanese, Dutch, Polish, and Russian data of web-based
translated materials (plus any combination between English, German,
Spanish, French, and Portuguese). Although Linguee is available in only in
these languages, it is much more widely known among translators beyond
these languages’ scope. That's not surprising to those of us who see an
obvious value in what it offers for us and our work, but it was a surprise for
the founders of Linguee. You see, they created the site for "average" folks
who needed a quick way to find translations from real-life usage cases within
their original contexts (even though most of them don't really understand the
concept of context, as you and I very much know).
Linguee may just change the way that many of us look for information from
now on. The two guys behind it have found ways to have web crawlers detect
translated content online and match that up with the help of a 50,000+ entry
dictionary and other web-based dictionaries. To look up a term or complete
phrase, just enter it into the search box; the matches that are displayed are
complete segment matches with the terms in question (both in source and
target) highlighted. At first glance the data contains no metadata (origin,
subject matter, etc.), but at second glance you will notice the links to the
originating sites, giving you all the metadata you could want. You don't have
to register to search; as a registered user you can evaluate the translations
and correct them, or you can add entries to the dictionary, which in turn are
used to fine-tune the matches.
CD Emulators
CD emulators are not essential equipment but they can be very convenient
programs. They allow you to create up to 23 virtual CD drives on your hard
drive, make an image (a complete copy) of the content of a CD and thus
enable you to play several CDs at a time, all at higher speeds than through
the original CD-ROM drive.
I’m familiar with Virtual CD, which has a very intuitive and easy interface. It
allows access to all features through a right-click on an icon in the taskbar.
Figure 96: Right-click access to all functions of Virtual CD and three emulated CD drives.
Once the emulated CD drives are created, they appear "as equals" to any
"real" CD-ROM drive under My Computer (Windows 8/8.1: This PC):
So, what’s the catch? Well, aside from the fact that some DVDs and CDs have
a copy protection that does not allow them to be copied, there is none. That
is, if you have a very large hard drive that can handle a lot of data. Even
though these programs all have some kind of compression capability, it’s not
uncommon to have 600 MB of data stored on just one single CD.
Compression Utilities
Arguably the most important utility you need to be able to receive and send
files properly is a compression program. Nothing can frustrate a client or
customer more than receiving a file of several megabytes that would have
been maybe a tenth of the size or even less if it had been sent in compressed
format (see File Transfer on page 75).
Some file formats, such as RTF or BMP, are particularly well suited for
compression because they can be minimized significantly; others, such
as JPG, GIF, or PDF, often shrink very little when being compressed
because they are compressed in themselves to start with.
Other important reasons for using compression programs are that they allow
you to send one file instead of many, and compressed files can also be sent as
password-protected files for safety reasons.
A search on the Internet reveals that there are probably as many different
programs out there as you could come up with word combinations containing
the word "zip"—ZipMagic, PowerZip, Quick Zip, ZipGenius, BitZipper, ALZip,
and TurboZip form only the tip of the iceberg—and of course PKZIP (see
www.pkware.com) from the "inventor" of the zip format, and the market
leader WinZip (see www.winzip.com), which is now owned by Corel. And, yes,
there are a great number of compression programs that do not contain the
word "zip". . ..
If you exchange a lot of files with other users who use Macs, you might
want to look into using Stuffit (see www.stuffit.com). Stuffit not only
unzips Windows-specific but also Mac-specific compression formats,
including SIT and SEA.
And while in earlier versions of compression utilities, the context menu tended
to be rather cluttered with a number of options, newer versions typically put
an end to this mess by giving only one option. This provides access to a whole
new sub-menu with the various old and new zipping options (which, by the
way, are configurable), including the ability to directly email the newly created
zip file (which saves space on your hard drive and means one less step in your
workflow).
If you are really paranoid about the safety of your files, you will want to
give a password in which you mix numbers and lower-case and upper-
case letters. The cracking utilities that are available to crack these
passwords (see Password Cracking Utilities on page 133) take
significantly longer to break these mixed passwords.
Another feature that most zip tools offer is the ability to split files into smaller
chunks so that they fit into an email or on a CD. Once you want to use the
file(s), the tools allow you to reassemble them into one large zip file again.
And here’s what I just discovered recently. Often I receive five different zip
files for a project. It has always annoyed me to have to right-click on each of
them individually and select the appropriate unzip command so that the files
will be unzipped into a folder that carries the name of the zip file. Then I
discovered that you can also select several zip files at a time (by holding the
CTRL key and clicking on each of them). You can then select a command to
extract them, "Extract to "*\"," which creates as many folders as there are
zip files.
As with so many other tools, there are a great variety of tools out there that
allow you to find the magic word, but the tools that I have been using
successfully come from ElcomSoft (see www.elcomsoft.com)—unfortunately
in different (paid) versions for the different products. Now, the magic word is
not magic for nothing, and it isn't easy to find, even for the smartest software.
Plus, there is also a reason for the different levels of "password strength" that
you are asked for on the various websites and programs you may be choosing
a password for.
Like any of their competing tools, ElcomSoft tools essentially have two
strategies. The fast and quick way is the dictionary attack. This is for simple-
minded folks like me who know that they would forget their password if it
were not the dog's name or something like that. This attack only takes a few
seconds and all it does is to run a large list of terms against the actual
password until the correct one is found. If that method is not successful, a
second method is applied, the so-called brute force method.
Figure 100: Selecting the right kind of attack in an effort to crack a zip file
Typically you can tell the program certain parameters (like only lower- or
upper-case letters, with or without numbers/special characters, or the
presumed length of the password) and depending on how complex and
accurate these are, a successful attack can take minutes or a whole day.
Figure 101: Selecting specific options for a brute force attack for cracking Office files
And while there are many websites that perform all kinds of conversions, it’s
still helpful to have a little freeware utility like Convert (see
www.joshmadison.com/software). Convert not only allows you to convert
between a multitude of measurements, but even lets you define your own
parameters.
Figure 102: Convert’s interface—the Custom tab gives you access to customizable
conversions.
• Word: 83 words
These tools applied three different theories to the count of these files:
• Word merely counts the words displayed in a browser, omitting all hidden
text, such as keywords or pop-up texts for graphics, etc.
• PractiCount, Déjà Vu, and Trados are specifically geared toward counting
or translating HTML files so they have to be able to reveal and count all
translatable words.
• UltraEdit (see Text Editors on page 101) counts all words, including a lot of
non-translatable coding information. It’s not a very useful number to
present to your client in an invoice (unless you are not interested in
keeping the client, that is. . .).
Interestingly, even within these groups there are fairly significant differences.
These are due to differences in the counting parameters. These include
questions of delimiters (how are \ and - counted and how many words is
C:\Program Files\Firefox or format-specific?) and numbers (are those
to be counted and, if so, how many words is 255.255.255.0?). And it
becomes very hairy, of course, when it comes to non-alphabet-based
languages or languages without spaces between words.
It seems that there are two main strategies for dealing with these problems.
You can avoid word counts altogether and either go with an hourly rate or a
character count (such as the 55 characters per line that many European
translators do business by), or you can make a special point with your client
to agree on a certain program for the word count.
Word’s word count is available at Tools> Word Count (Word 2007 and
above: Review> Proofing> Word Count). In Word 2007 and above
and on the Mac versions of Word (Word 2001 and higher), the word
count is displayed on the status bar by default.
You will need to be cautious with Word’s word counts because Word is famous
for skipping texts in comments, text boxes (before Word 2007), WordArt,
headers, and footers. At www.shaunakelly.com/word/CompleteWordCount
you can find a free and more "accurate" word count macro than the feature
provided by Word itself.
What these word counts do not address is batch counts of several documents
at a time. I usually choose to do that (and word counts for many other
supported formats) with translation environment tools (such as Déjà Vu or
Trados), but there are also many specialized programs such as AnyCount
(www.anycount.com) or PractiCount & Invoice (see www.practiline.com), both
of which support a very large number of file formats.
Especially PractiCount not only supports all of the file formats above and more
(text- or word-processor-based, Excel, HTML, PowerPoint, XML, and PDF), but
it also offers a variety of reporting options for direct use in invoices, and it has
a very customizable set of word count options. You can choose to use Word’s
own word count module for most of the supported formats, or you can
customize the rules by defining your own delimiters.
You can also count words in embedded text-based objects, and you can count
editing time in Word and PowerPoint documents (the numbers are based on
information that you can access under File> (Info>) Properties>
Statistics in Word and PowerPoint.)
Figure 106: Word count summary view with easy options to save directly in various file
formats
Time Tracking
There is no need to explain why it is important for translators to have a good
mechanism to track time. Some programs show you how much time you have
spent working on them (for instance, Microsoft Word or OpenOffice under
File> Properties (or: Info)) but that usually includes all the time you had
the document open (while you had lunch, went to the bathroom, or took a
nap).
The most common way to log the time we spend on an individual task is
probably in an Excel spreadsheet. Two keyboard shortcuts have made it
easier for me to keep track of my time in Excel:
A spreadsheet with a third field for the total time (formula: =SUM(<end time
field>-<start time field>) can then ensure that your time is being
calculated accurately.
While it is possible to record your time in this manner, there are some little
programs available that make it a lot easier. Time Stamp (see
www.syntap.com) is a free program (supported by optional donations) which
allows you to track the start and end time for projects you are currently
working on with a click on a button in your task bar. It’s even possible to have
several instances of the program running simultaneously so you can switch
back and forth between different projects that you’re working on. When you
are completely finished, all the time that was spent on each project is
summed up and can either be printed out or saved as a text file. This is a nifty
little program which requires neither a lot of computer resources nor a lot of
time to learn.
The Ukrainian software maker AIT released a time tracking tool specifically
geared toward language professionals. Similar to its generic counterparts,
ExactSpent (see www.exactspent.com) tracks time for multiple jobs and/or
clients simultaneously and even has a little (configurable) feature that
reminds you if you have not touched your keyboard for some time. It leaves
very little footprint on your computer and minimizes itself to the system tray,
where it can easily be accessed and controlled with a mouse click or
configurable keyboard shortcuts.
Yet another possibility to track time for your various tasks across different
machines and devices is with a cloud-based service such as Toggl (see
www.toggl.com), ClockingIt (see www.clockingit.com), MakeSomeTime (see
www.makesometime.com), or Klok (see www.getklok.com).
Recent versions of Microsoft Office include the Office Clipboard (in Office 2000
applications this is available as a toolbar under View> Toolbars>
Clipboard, and in Office XP and above a slightly more usable application
available under Edit (Home)> (Office) Clipboard), which allows you to
collect up to 12 (2000) or 24 (XP and above) different clipboard items from
anywhere on your computer and paste them individually or all at once into
any Office document.
Figure 110: Office XP Clipboard with copied content from several Office applications
If you don’t limit yourself to Microsoft Office programs, though, this is not
very helpful, and has very limited functionality.
What I really needed, especially for editing tasks, was a program that would
not only collect clipboard entries between all possible programs, but would
also make it possible to directly print from my clipboard or store clipboard
entries between different computer sessions (i.e., after I switched the
computer on and off). I finally found this little helper in ClipMate (see
www.thornsoft.com).
ClipMate is a little program that you can configure to start automatically every
time you start Windows (Config> User Preferences> General> Run at
Windows Startup). It collects an unlimited amount of clipboard content
containing anything from text to graphics to complete files or folders. It is
displayed as a little icon on your task bar and you can open the ClipMate
Explorer by simple double clicking that icon.
Figure 111: ClipMate Explorer with a preview pane (bottom) and a collection view (upper
right)
ClipMate can accomplish all the tasks I need it for, and it’s even possible to
edit the clipboard content once it is stored in ClipMate.
Merging Files
Have you ever had a lot of files from one subject or client that would have
been so much easier to handle if they could have been merged into one large
file, for instance for alignment purposes?
Though it is often possible to copy and paste into one large master file, it can
be tedious and frustrating if the original files are extremely large. Twins File
Merger (see www.twins-software.com) is inexpensive shareware that allows
you to merge as many MP3, MPEG, text, and Word files as you would like (the
unlicensed copy allows you to only merge two files at a time). Like most of
these specialized utilities, the use of this tool is very self-explanatory and the
effect that it has on the performance of your computer system is very small.
Office Suites
It’s hard to imagine that a translator could work productively without some
kind of office suite, a software bundle that includes word processing,
spreadsheet, and presentation applications, and, depending on the package,
various other programs.
This certainly does not mean that you could not have some of the other suites
as well, or that some of the other suites are less productive and/or more of a
headache (after all, I "grew up" with a DOS version of WordPerfect, and it
took me a long time to get used to Word). What I find most exciting about this
Still, I believe it’s more important to consider which line of Microsoft Office
should be used and how often you should upgrade than which of the office
suites a translator should use.
If you are sure that you will not have any conflict with any other program, you
can start to look into upgrading. In general, the best advice for upgrading
Microsoft Office may be to wait until you buy a new computer (assuming that
your new computer comes pre-installed with Office). The changes between
the different versions often make very little difference in our work as
translators, so it may be hard to justify the fairly significant expense.
Depending on the language version of Office you are using, there will be a
limited number of spell checkers pre-installed. You can buy a full add-on
version with 51 languages (Office 2003), 48 languages (Office XP), or 30
languages (Office 2000) on Microsoft’s or other websites.
For Office 2007, this system has changed quite a bit. Here you can buy the
additional spell-checkers only as part of the Multi-Language Pack which also
includes the user interface in the respective languages. You can either choose
to buy all 37 covered languages or individual products for each of the
languages. There is no compatibility between Office 2007’s additional spell-
checkers and those of earlier versions.
For Office 2010, things again have changed. There is now unfortunately no
Multi-Language Pack for the non-corporate user and, of course, there is no
compatibility between the language packs for Office 2007 and 2010. Each
version of Office contains spell-checkers for between 3 and 7 languages (you
can find a list under tinyurl.com/7mwbv6p). For additional languages you will
have to purchase individual language packs. These also come with three or so
spelling languages but only one additional user interface language.
If you are a user of a "minor" language, you might just be lucky enough
to be able to download an LIP (Language Interface Pack) for your
language that includes the ability to run Office in that language, the
spelling checker of that language, and sometimes even a help systems
and templates in that language. Under tinyurl.com/2g6y39l you can find
out right here which approximately 60 languages with what features are supported
for that program.
On the bright side, it is much easier to find, purchase, and install the language
packs in Office 2010. You will just need to select File> Options> Language
in an Office program and there you'll find the respective links.
For Office 2013 all of this has finally changed for good to what it really should
have been in the first place: grammar and spellcheckers are now freely
available for all languages (that are supported in the first place). That's great,
but it's made even better by the fact that you are asked automatically
whether you want to install a new language once it's detected in your text (if
you find the reminders annoying, you can disable them under File>
Options> Language).
When I refer here to "Office 2013", this includes Office 365 as well since
essentially they are the same software with different pricing models.
In Office 2010 and above under File> Options> Language you’ll find the
option for a download to have the ScreenTips (previously called QuickInfo—
the tidbits of information that you get when you put your mouse cursor on any
item in the user interface) in any language. Depending on your perspective,
having this feature can fall anywhere on the spectrum between helpful and
fun—but either way I recommend that you download it in a language that is
not covered by the user interface.
Excel 97 put an end to Excel 95’s limitation of 255 characters per cell (and
upped that to 32,000) and 16,384 rows in a worksheet (extended to more
than 65,000; Excel 2007 and above: more than 1,000,000). This was a
significant improvement for translators who work with Excel to store or
exchange glossaries and/or databases, which often go beyond either of those
original limitations.
There is little difference for translators between Excel 2000 and one of the
higher versions with the exception of a couple of functions:
• Excel XP and above provide for an integration into Internet Explorer. If you
have Excel XP installed, you can right-click on any web page that contains
a table (most web pages do) and select Export to Microsoft Excel from
the shortcut menu. The text of this web page will automatically be copied
into an Excel spreadsheet. This is great for copying glossaries.
• Excel XP and above offer a new way of listing search results with
accompanying hyperlinks in the Find and Replace dialog (under Edit>
Find; 2007 and above: Home> Editing> Find & Select). This makes it
very easy to search glossaries in Excel.
Excel 2000, Word 2000, and Access 2000 added the very important Unicode
support. Word 2000 and above also allows you to save a text in a different
code page, a feature that comes in very handy in many situations.
You can access this feature under File> Save as> Encoded Text in
Word 2000, File> Save as> Plain Text in Word XP and Word 2003,
and Office button/File> Save as> Plain Text in Word 2007 and above.
The drawback: Word documents with this feature enabled are significantly
larger, and the (invisible) tags that Word places around special characters to
detect the different languages tend to interfere with other programs in which
you may process the Word document. Turning off the automatic detection will
not delete the tags. To delete those, you will have to save the document to an
earlier version in which this feature was not supported.
There were few new features in Word XP that would make a great difference
to translators—even considering the new Translation option in Word XP—
with the possible exception of a completely new view of tracked changes and
comments (in any of the versions, this is a very important feature for anyone
who does editing or proofreading work) and the ability to select several places
in one document to apply formatting simultaneously (by pressing the Ctrl key
as you make your selections).
The Track Changes functions are accessible under Tools> Track Changes
(up to Word 2003) or Review> Track Changes (in Word 2007 and above).
When you use the Track Changes function in Microsoft Word, be aware that
there are important pitfalls to avoid.
Some clients like you to use the Track Changes feature so that they can get
an impression about the quality of the original translator (or about how much
you may over-edit a text…), while other clients want a clean text that has all
editing marks removed and that can be finalized without further ado.
For a client of the second category it is not sufficient to simply hide edits from
the screen view (by selecting Tools> Track Changes> Highlight
Changes> Hide from Screen before Word XP, by selecting the appropriate
Show command on the Reviewing toolbar in Word XP and 2003, or by
selecting the appropriate command under Review> Tracking in Word 2007
and above). To make sure that you have deleted all edits in a document,
select Tools> Track Changes> Accept or Reject Changes> Accept All
(before Word XP); in Word XP and 2003, select View> Toolbars>
Reviewing, click on the little down-arrow beside the Accept Change button,
and select Accept All Changes in Document; and in Word 2007 and above
select Review> Accept> Accept all Changes in Document. Of course,
with any version of Word you can also make case-by-case decisions to accept
or not to accept an edit (the easiest way is to right-click the edits and select
the appropriate commands).
Word 2013 also has the helpful Simple Markup feature (on the Review
ribbon). Here you can show the location of markups without showing all the
markups in the text detail, which, as we all know, all too often makes a
document virtually unreadable.
Another helpful feature for editing purposes is Word 2013’s feature that asks
you whether you want to jump back to the place you had last viewed when
opening a longish document.
Word 2010 and above offers one new feature that many of you will
appreciate: the Navigation bar, a three-tabbed pane to the left of your open
document (of course, you can move it anywhere else inside or outside the
Word window) that appears when selecting View> Navigation Pane or by
hitting CTRL+F. Yes, you read that right: CTRL+F, the one shortcut that we all
know to search for something within a document. The reason for this is that
one of the tabs, and possibly the most helpful, is indeed a search tab in which
you can search for text or other items. All occurrences of the searched item
will be listed, and you can then jump to it by clicking on the listings.
Figure 120: Navigation pane in Word 2010 with list of occurrences of searched term
Another concept that Microsoft uses in its last few editions of Office is
"Research." Most Office applications now have a Research command
(available on the Tools menu or with the clever key/mouse combination
ALT+CLICK) that allows you to automatically search a number of associated
dictionaries, thesauri, and other sources of information. The fact that the
information provided differs radically between different languages shows that
this is a concept which still needs some maturing; still, it can be helpful in
some cases.
RSS stands for a lot of things, but the easiest is apparently "Rich Site
Summary." This is a technology that allows you to be very specific about
what kind of information you would like to have sent to you. As
translators, for instance, we regularly visit a variety of websites:
newspapers in our source and target languages, translation-specific
newsgroups, discussion groups on translation tools and various other translation-
related topics (many of which are located on groups.yahoo.com), and whatever else
we desire for our non-translation lives (let's hope there is such a thing).
Many of the websites and other online resources mentioned above allow you now to
subscribe to what are called "RSS feeds." RSS feeds consist of XML-based data that
looks really ugly if you view it as a text file or right in your browser, but it looks quite
proper if you display it in a specific RSS reader. So, because I'm interested in
international news from the New York Times, and because I also know that the NYT
publishes very well-defined RSS feeds, I now subscribe to those (see
www.nytimes.com/services/xml/rss). Instead of having to go to www.nytimes.com
every half hour, the NYT now sends me data blurbs in real time with links for more
information. And the same applies to all of the other sources I mentioned above
along with many, many others.
To collect these RSS feeds you can use standalone desktop programs, web-based
programs, plug-ins for browsers, or e-mail programs. Almost every browser and/or
email application now supports them directly without the aid of a third-party tool.
Another helpful feature of Outlook 2007 and above are the integrated search
features that previously were only available through third-party add-ons.
Figure 122: Outlook 2010 with an applied search and various RSS feeds
For any of the other programs that may or may not be directly part of the
shrink-wrap version of Office but are considered by Microsoft to be part of the
Office Suite—including Publisher, Visio, or Project—it probably does not make
a large difference to any typical translation task which version you have (if
any).
Compatibility
There is the issue of compatibility, of course. Before Office 2007 all files of all
versions from Office 97 on were compatible with each other (with the
exception of Access and Publisher). Most newer Office files cannot be opened
in versions of Office 95 and earlier.
Figure 123: Warnings for loss of features when down-saving Word document
The different applications of Office 2007 and above use a different file
structure (and in fact, even a different set of extensions) and are not
compatible with earlier versions. However, it is possible to down-save any file
within Office 2007 and above applications to an earlier format (select the
Office button (File)> Save as). Alternatively, Microsoft has also released a
Compatibility Pack for Word, Excel and PowerPoint at tinyurl.com/y5a879.
In Word 2007, you can access the Options dialog by selecting the Office
button and clicking on Word Options in the lower right-hand corner, in
Word 2010 and above you access it by selecting File> Options.
Figure 125: The Advanced tab on the Options dialog in Word 2007
• Tools> Macro> Record New Macro (Word 2007 and above: View>
Macros> Record New Macro)—This is a very easy way to "record"
macros (i.e., a series of commands that can be initiated by a keystroke or
toolbar button) for recurring tasks.
The developers of Word 2010 and above have fortunately listened to the
outcry of the user community and made the ribbons customizable again
(under File> Options> Customize Ribbon).
• You’re probably familiar with Word’s Format Painter, the icon with the
paintbrush. You can click on or select any text in your document, select the
Format Painter, and then copy the formatting of the selected text by
highlighting another block of text. What you may not know is that you can
also use the same procedure and double-click the Format Painter icon.
After being double-clicked, the icon remains activated and the desired
format is available to you until you press the ESC key.
• Opening Word: By default, Word opens a new blank document when you
open the program. Sometimes this is helpful, but often it is not. To avoid
this, right-click on the icon that you use to open Word with, select
Properties, and add " /n" to the Target line. Now Word will open without
a blank document.
If you need to open a new file in Word, you can do that either by pressing
CTRL+N, which will open an empty blank document, or you can select File
(Word 2007: Office button)> New. The menu command will not open an
empty file but instead give you access to your templates. (In Word XP and
2003 a side bar will be displayed. Clicking on the General Templates
option will display the following dialog.)
The easiest way would be to just delete the templates. But in certain
situations they do offer functionality that you want to use.
Here’s what you can do: Move them out of a startup folder and into a
folder where they can be started manually instead of automatically.
To Move Templates
1 Select Tools> Templates and Add-Ins within Word (Word 2007 and
above: Select the Office button (File)> Options> Add-Ins> Word Add-
ins under Manage> Go).
2 The Templates and Add-Ins dialog appears. The templates with a
checkmark are activated.
3 Though it is possible to uncheck these templates and disable them for this
session, they will be started again the next time you open Word if they are
located in a startup folder (see the Full Path on the bottom of the dialog).
4 To change the location, close this dialog and the instance of Word and go
to the Windows Explorer (or any other folder view).
5 There are two different locations where Word uses startup folders (if you
have used the default installation path):
and
If you are not able to find your AutoStart templates in these folders,
right-click on C:, select Search, and make a search for the name of
the template (see Helpful Shortcuts on page 16).
6 Cut the templates out of these folders (CTRL+X) and paste (CTRL+V) them
into:
C:\Documents and Settings\<user>\Application
Data\Microsoft\Templates (in Windows NT, 2000, and XP),
C:\Users\<user>\AppData\Roaming\Microsoft\Templates (in Windows
Vista and above)
or
You can also save them at a different location, but it may be helpful
to have most of your templates stored in one location.
7 The next time you start Word, the templates will not be loaded
automatically, but you can load them manually by selecting Tools>
Templates and Add-Ins, adding the templates in question, and
activating them.
For a long time, Excel files were almost abandoned by the translation industry.
Only one computer-assisted translation tool—Star Transit (see Translation
Environment Tools on page 192)—supported their translation through its own
environment. Only in the last few years have most of the other major
translation environment tools also started to support Excel. From my
perspective as a translator, this early abandonment stands in awkward
contrast to the relatively large share of Excel files that I translate.
Many of the more general things that have been said about Word in the
previous section are the same or similar for Excel, including the use of
macros (see page 168), customization of toolbars (see page 169), the
Format Painter (see page 170), or opening properties (see page 171).
One more thing that may be important when using Excel is to understand the
difference between comma-separated (CSV), tab-separated (TXT), and Excel
(XLS) files.
Excel files are complex files that can contain formatting, embedded objects,
formulas, and numerous worksheets. In comparison to that, comma-
separated and tab-separated files are very simple text files that are built
according to this pattern (for tab-delimited files, replace the comma with a
tab):
"first record in first row","second record in first row","third record in first
row"
"first record in second row","second record in second row","third record in
second row"
If you open this file in Excel, it will be displayed just like an Excel spreadsheet;
in fact, in many cases, the file will automatically open in Excel when you
double-click on it. The reason why these files are so often used is that these
formats provide for generally accepted ways of exchanging data between all
kinds of databases. The Microsoft glossaries (see page 103) that are delivered
as CSV files provide the best case in point.
One of the most exciting Excel add-ons that makes many of the text-related
(and other) tasks in Excel a lot easier is ASAP Utilities (see www.asap-
utilities.com). This free collection of programs contains more than 300 (!)
different utilities to streamline working in Excel.
Some of the functions that I really like include the ability to count characters
in individual cells (a command in the Information submenu), helpful
formatting and selection functions, and the ability to write numbers with a
leading zero (it was always a pet peeve of mine that you couldn’t do this.)
During installation, you will be asked whether you would like to have it started
every time you start Excel (I chose "Yes"). ASAP Utilities shows up as a
separate menu in Excel. If nothing else, you'll enjoy seeing what some of the
other 95% of Excel's unused features are . . . .
Before quoting on a PowerPoint project, always make sure that all text is
actually translatable and not an embedded object such as a graphic. You can
check this by right-clicking on the slide. If picture-related commands show up
(see graphic below) or the picture toolbar appears, you are dealing with a
graphic rather than text.
But do not despair: here is way to do it once and for all. Though this may
seem a little technical, it is not nearly as bad as it first seems. Let’s first cover
PowerPoint 2003 and below and then continue with the different process of
PowerPoint 2007 and above.
shp.TextFrame.TextRange.LanguageID = msoLanguageIDEnglishUS
To change this declaration into the language of your choice, you'll need to
change the msoLanguageID at the end of that line. You can find your language
(as well as many other languages you most likely have never heard of) at
msdn.microsoft.com/en-us/library/aa432635.aspx.
Close the Visual Basic editor and you will now have a macro in your
PowerPoint presentation.
To run the macro, you need to once again select Tools> Macro> Macros (or
select ALT+F8), highlight the Lingo macro, and select Run. The spelling
language in all text boxes should now be changed.
Since you don’t want to recreate this process for every file you are working
on, you now need to find a way to make this macro available for all files
(unlike in Word, macros in PowerPoint are stored on a per file basis).
The first thing step is to verify the security level for your PowerPoint
installation. Different versions of PowerPoint contain different levels of
security measures, some of which prevent running of "Visual Basic projects,"
i.e., macros from other files. In PowerPoint XP and 2003, you need to have
Trust access to Visual Basic project enabled under Tools> Security>
Trusted Publishers (PowerPoint XP: Trusted Sources), and the security on
the Security Level tab needs to be set to Medium. In PowerPoint 2000
there are no limitations and you don't need to do anything.
When these settings are secured, you can use the macro from another file in
your current file. There are two ways to do this. You can have both files open,
select Tools> Macro> Macros (or select Alt+F8), and select All open
presentations under Macro in. This will display the macro in question that
you can now run. Or, much easier, you can create a button that contains the
macro. Open the file with the macro, select Tools> Customize>
Commands, and scroll down the list of Categories until you get to Macros.
On the right-hand side you will see Lingo. Drag Lingo to your toolbar and that
new button will now give you access to the ability to change your language for
any PowerPoint presentation, PROVIDED that you do not delete the original
PPT file that contains the macro.
Unfortunately things are not made easier for the newer versions. Still, it’s
possible to change the spelling language and here is how you would do it:
Open a text file and copy the following content into that file:
Sub Lingo()
shp.TextFrame.TextRange.LanguageID = msoLanguageIDEnglishUS
End If
End If
Next
Next
End Sub
shp.TextFrame.TextRange.LanguageID = msoLanguageIDEnglishUS
contains within the msoLanguageID at the end of the line the declaration into
which language the file needs to be changed. This needs to be changed to
your desired language. You can find your language at msdn.microsoft.com/
en-us/library/aa432635.aspx.
Figure 135: Saving the macro as a *.bas file in Notepad. Note that placing the file name in
quotation marks prevents the addition of TXT at the end of the file name
For the next step, you’ll have to do some preparation (you’ll need to do this
only once). If the Developer ribbon is not displayed, select
• File> Options> Customize Ribbon and in the list of main tabs, select
Developer for PowerPoint 2010 and above or
Once that is done select the Developer ribbon and click on Visual Basic (if
you don’t want to deal with the Developer ribbon you can also press
ALT+F11).
In the now opened Visual Basic editor select File> Import and browse to the
location where you saved the *.bas file and open that file.
Once that is done, all you need to do is to press F5 or click the Run button
and the language in your PowerPoint file will be changed.
OK, I admit, this all sounds a little complicated. But here is the bright side:
most of the steps only have to be done once per computer (or PowerPoint
installation). Once the *.bas file is created, the Developer ribbon ready and
the security settings selected, all you need to (for every PowerPoint
presentation) is to
Here is the caveat, though: for none of the PowerPoint versions is the Notes
text changed by any of these acrobatics. That will have to be done manually
one by one. Oh, well.
Star Transit XV and above allows for the direct processing of embedded
objects:
First of all, the category of CAT tools encompasses much more than
"translation memory tools." By definition, any tool that is specifically designed
to aid the translator in the translation process falls under the category of CAT
tools. In the following pages, I will focus on translation environment tools, but
will also talk about other kinds of tools.
I like categorizations because they sometimes help to convey the big picture
more clearly, so I have created three different categories of CAT tools:
1 tools that independently provide specific functions for the translator
2 tools that provide functions to enhance the use of TEnTs (translation
environment tools)
Here are the functions of the first main category and examples of the tools
that cover those functions:
• Resource lookup—These include tools such as Wordfinder (see
www.wordfinder.com), which assembles dictionary resources and provides
lookup in them from any Windows-based tool, and tools like
IntelliWebSearch, which allows for lookup in online and offline resources
(see page 124).
primarily geared toward the freelance translator. You can find much more
information on these tools on page 304.
The second category is made up of tools that cater to the needs of TEnTs—
either by making them better in a specific area or even giving them additional
abilities that they flat-out don’t have.
can find more information on the process and the tools that support it
under . . . . A Word of Caution About Alignment on page 243.
• also allow the user to build up terminology databases that complement and
extend the functionality of the translation memories, and
• allow translators to work in very complicated file formats that they may
not understand or otherwise be able to support by hiding or protecting the
code and displaying only translatable content.
Furthermore, many of the tools provide methods for analysis, quality
assurance, and productivity.
In that same era, several other translation environment tools also entered the
public arena.
The translation agency Star released a product that was originally designed
for in-house use: Star Transit, with its terminology component TermStar. IBM
released its Translation Manager (TM/2) product in 1992 (which it buried in
2002 and revived once again in 2010 as the open-source product OpenTM2).
Curiously, these three tools all were initially developed in the small
German town of Böblingen.
The last few years have seen a number of new translation environment tools
enter the market (see Categories of Translation Environment Tools on page
195) and a number of mergers and acquisitions of translation environment
tool vendors as demonstrated by the acquisition of Trados by SDL in June of
2005 and of Idiom in 2008, the acquisition of the German Logoport by
Lionbridge in early 2005, or the "alliance" between Wordfast and
Translations.com/TransPerfect in 2007.
In 2009, long after IBM had decided to withdraw from the translation
environment tool market, another truly big-time player, Google, entered the
fray with the release of the Google Translator Toolkit (see page 232).
Old tools are discontinued at nearly the same pace, such as Alpnet’s
(now SDL) TSS/Joust, SDL’s Amptran, Quintillian, Clear-CAT, SDLX,
Cypresoft’s Trans Suite 2000, Aliado, or NoBabel.
There is a very impressive, albeit now slightly outdated, list of all kinds of
translation software programs (in particular machine translation) published by
John Hutchins, the great chronicler of translation software. His Compendium
of Translation Software—directory of commercial machine translation systems
and computer-aided translation support tools (see www.hutchinsweb.me.uk/
Compendium.htm) really is a very interesting document, if only to see how
much software there actually is to support our work. One very practical
application of the document is the index of language pairs for machine
translation in the very back of the manual. I often receive questions about
whether certain language pairs are supported by a particular system. Well,
here are the many answers.
Also, if you work in more complex file formats than Word documents or you do
not want to worry about formatting, TEnTs separate translatable from non-
translatable content and will help you tremendously.
Or if you would like to use more advanced quality assurance features than just
spell-checkers, you should also look at TEnTs.
Or if you ever need on-the-fly access to previous translation—TEnTs can do
that for you.
• tools that perform all or most of their work through macros in Microsoft
Word that allow an association with translation memory(s) and
terminology database(s)
In the following sections, I will introduce the different tools within their
categories, briefly describe the one or two outstanding features of the
different tools, and eventually spend more time with examples of the more
prominent tools to describe the typical features of a TEnT in more detail.
Tools that I will not include are those that support only one or two
language pairs, such as the Danish-English(-German) WebWordSystem
(see www.webwordsystem.com) or the Russian-English MT2007 (see
mt2007-cat.ru); tools that have only a niche audience, such as Felix (see
felix-cat.com) or VisualTran Mate (see www.visualtran.com); and tools
that support only one file format, such as Webbudget (see www.webbudget.com).
I also will not discuss translation management systems including proprietary
systems, such as translation.com’s GlobalLink, commercial tools like Andrä’s
ontram, and open-source tools like Globalsight Ambassador. These tools are
becoming increasingly important for our industry, but there are a number of
distinctions that made me exclude them. While most of them support exchange
standards, their workflow does not allow for third-party tools to participate. This
means that if your client uses one of the above tools, chances are that you will have
to use the translation editor that comes with the tool. The good news is that these
editors are typically free; the bad news is that you have to get used to a new work
environment and are often not able to use your own resources (translation
memories, terminology databases, etc.). Also, a purchase or an implementation of
these tools, if at all possible, is only feasible for the very large language providers or
the translation buyer.
• The other tools are primarily focused on native Word files or other
formats that can be accessed through Microsoft Word (either through a
tagging mechanism or by "calling" into other applications).
Wordfast Classic
The most successful tool in this group presently is Wordfast Classic, a tool
developed by Yves Champollion. Yves is related by name and blood to Jean-
François Champollion, the fellow who translated the Rosetta Stone. The
history of the product itself is a little more mundane but still rather
interesting. Released as a freeware product in 2000, Wordfast stunned the
market—particularly Trados, to which Wordfast at first sight looked very
similar. In August 2001, Champollion and the Italian translation agency giant
Logos formed a joint venture but continued to give the program away for free.
The partnership ended about a year later, and in October of 2002 Wordfast
became commercial. In 2007, Wordfast again allied itself with a translation
giant, this time translations.com/TransPerfect.
Features that are not immediately apparent in Wordfast Classic include the
ability to share translation memory data with other translators in realtime
(see page 277), a set of relatively sophisticated quality assurance features
(see page 268), and an autocomplete feature that completes your entries as
you type them (see page 250).
Anaphraseus
JiveFusion
Similis
Similis differentiates itself from most other tools because it comes with a very
high-level linguistic "knowledge" in seven EU languages (English, Dutch,
German, Spanish, Italian, Portuguese, and French), which it derives from a
powerful engine that was originally developed by Xerox for its XTS tool (see
page 289). This engine gives Similis the analytical power to apply linguistic
rules to a number of processes, including alignment and automatic extraction
of terms and phrases from translation memory content. In both of these
processes Similis offers extremely high accuracy (of course, only in the
language pairings mentioned above). For the actual translation process it
offers two different environments: a hybrid Word/Similis environment for the
translation of all files directly compatible with Word (Word, RTF, text files,
etc.) and a separate environment for HTML and XML files. What makes the
translation memory matches remarkable is the existence of "chunks,"
fragments of translation memory matches that the program was able to
automatically extract from larger matches with the help of the XTS engine.
Starting in late 2010, the makers of Similis have been giving away the
fully functional Freelance edition as a free download.
MetaTexis
MultiTrans
MultiTrans does not completely fit into this category. In fact, it is not a
"traditional translation memory" tool to start with, but a "bi-text" or "corpus"
tool, or, according to the tool’s latest terminology preference, a "TextBase
translation memory" tool. Rather than matching on a sentence-by-sentence
level, MultiTrans’ corpora are full source and target texts with an approximate
matching capacity that allows alignment to be done virtually on the fly. What
also distinguishes corpora from traditional translation memories is the display
of all the context of the original text.
Figure 143: MultiTrans’ translation view with MS Word on the bottom and Translation Agent on
top
Aside from the Word interface, MultiTrans also offers the translation of
files in a PowerPoint and WordPerfect interface as well as a completely
independent XLIFF Editor (which needs to be purchased as an add-on)
for the translation of tagged file formats, including HTML, XML, InDesign,
and of course XLIFF.
Translation Workspace
The system itself is a hybrid system. While all the work is done on your
computer, with all the documents that you are translating on your local
machine, the supporting data (TM, glossary—it really is not a full-fledged
termbase—and all administrative controls) are based on Lionbridge’s servers.
The interface in which you translate is either within Word or an independent
tool somewhat reminiscent of Trados TagEditor, called XLIFF Editor.
Figure 144: Translation Workspace’s Word interface with TM and terminology matches and a
preview feature
The XLIFF Editor is able to translate Office 2007/10 files, Trados TTX files,
FrameMaker files, and XML- and HTML-based formats. The Word interface can
access any Word or RTF-based file.
Figure 145: Translation Workspace’s XLIFF Editor with TM and terminology matches
One feature that is unique is its approach to the review process. This takes
place in a separate, completely web-based, tabular interface with error-
tracking, version control, etc. Though you will have to expend some extra
effort to create the review packages (upload the translated, bilingual files),
you'll have the benefit that the very last version of your translated and edited
files ends up in the translation memory.
Snowball
Snowball is a different kind of translation memory tool (I’m actually not sure
that it is a "true" TEnT since most of the translation environment it provides is
indeed the translation memory—it uses the phrase Integrated Translation
Environment). The proposition of this tool is to be as quiet and unobtrusive
and easy as it possibly can be. The 2MB installation allows you to create
bilingual TMs that you can use while translating Word documents. It does not
have a separate terminology database, but you are encouraged to enter term
pairs alongside complete sentence pairs when you use the "word for word"
rather than the "whole sentence" mode. It’s an inexpensive tool for the very
occasional translator, someone who is not willing to learn anything more than
a few shortcuts (which truly is all you need to operate it) and who has tons of
perfect matches—there is no fuzzy matching.
Figure 146: Translation in Word with an attached Snowball TM with two perfect matches
AnyMem
Like Snowball, AnyMem is a low-priced tool that runs exclusively within Word.
While it offers a term search within the translation memory, it does not have
an independent terminology component. It uses the same system to process
text within bilingual Word files that Trados 2007 and earlier and Wordfast
Classic use; as a result the files are interchangeable.
Figure 147: Translation in Word with an attached AnyMem TM and a fuzzy match
Trados Studio
SDL Trados has been the market leader among TEnT vendors for a long time
and partly due to the age of the tool and the increased necessity to serve a lot
of different markets and users, the tool had morphed into a whole range of
connected applications geared towards different file formats (such as the MS
Word interface for Word-compatible files, the TagEditor interface for tagged
files formats, and the T-Windows applications for anything else), different
activities (Workbench for translation memory purposes, MultiTerm for
terminology maintenance, WinAlign for alignment purposes, S-Tagger for
FrameMaker/Interleaf conversion, etc.), and different purposes (translation,
project management, workflow design, etc.). In 2009, when SDL came out
with Trados Studio, a completely redesigned version of its tool(s) that
combined almost all of the above-mentioned separate applications into one
interface, it was a risky move—but one that turned out to be successful.
The Trados Studio translation interface is very similar to the now de facto
standard that tools like Across, Déjà Vu, and memoQ have always used: a
tabular interface with the source text in a left column and the target text in a
right column. It would not be true, though, to claim that Trados Studio is
simply a clone of these tools; there are just too many unique features and
innovative features for that. Here are some of them:
• Trados Studio was the first tool that came out with an automatic
suggestion feature to complete typing for you (comparable to the way you
receive suggestions based on previous entries when you enter text into an
Excel spreadsheet or the address field of a browser). In Trados’ case, the
suggestions are based on a separate "AutoSuggest" database, entries in
the terminology database component MultiTerm (which remains the only
application that is not primarily maintained in the main interface), and
AutoText entries.
• Another "feature" that SDL has introduced and that has so far not been
followed by any of its competitors is the online app marketplace
OpenExchange (see www.translationzone.com/openexchange). Any owner
of the Studio Professional edition of Trados Studio can have access to the
API, the application programming interface, for many of the components of
Trados Studio with which it is possible to develop applications that extend
the functionality of the main program. These can then either be used
internally or offered on the OpenExchange website for free or for a
licensing fee. The introduction of OpenExchange has turned out to be a
very helpful move for SDL. Not only have there been many helpful apps
developed by third-party developers, but many are in fact placed there by
SDL developers who have the option to turn new features into external
apps rather than internal features of the main tool, which would make a
very complex application even more complex.
Figure 148: Trados Studio’s translation interface with activated Track Changes and automatic
typing suggestion
Déjà Vu
one interface for all activities. Déjà Vu X2, released in 2011, was essentially a
continuation of the previous version, with the added features of auto-
complete ("AutoWrite"), subsegmenting ("DeepMiner"), and a deep
integration of machine translation.
Déjà Vu offers a very large range of supported file formats. While its user
group is no longer as passionate and boisterous as it was during the late
1990s when the "flame wars" raged on the Lantra-L list between users of
Trados and Déjà Vu (search the archives at segate.sunet.se/cgi-bin/
wa?A0=LANTRA-L), it still is a tool of great value, particularly because of a
number of innovative features:
• The assemble feature that Déjà Vu pioneered allows for the "piecing
together" of translation form the various resources, including terminology
database, glossary ("lexicon"), and fragments from the translation
memory. Provided that the quality of these resources is good, the
advantage to the translator can be considerate.
• Déjà Vu also uses a unique repair feature for fuzzy matches, where the
terms and phrases within the translation unit that differ from the match in
the translation memory are automatically replaced with the correct term or
phrase if that term or phrase exists in one of the resources.
• This fuzzy match repair feature also works with machine translation where
only the "offending" part of a segment is translated by a machine
translation engine and potentially turns the fuzzy match into a perfect
match.
Figure 149: Example of a fuzzy match (see the highlighted part of the TM match in the lower
right-hand corner) turned into a perfect match by querying machine translation in
Déjà Vu X2
Star Transit
files. The benefit of this is the exact customizability of the translation memory
and the inherent availability of context. The drawback lies in the large number
of translated file pairs that have to be retained to provide the necessary
"reference material."
Starting with Service Pack 7 for Star Transit NXT, a parallel translation
memory system with the TM-Container was introduced in the fall of
2013.
Star also does not release many "versions"—Star Transit 2.7, the much-loved
and very stable version, was introduced in the late nineties, followed by an ill-
fated and faulty successor (Star Transit 3) that was quickly replaced with
Transit XV in 2001 and with Transit NXT in 2008. These are long stretches
without new payable versions for a development company, especially because
the development never stopped and was released in the form of Service
Packs. To offset this, Star is charging for the support of the following formats:
FrameMaker, PageMaker, Interleaf/Quicksilver, AutoCAD, QuarkXPress, and
InDesign. Overall, Star probably has one of the largest number of supported
file formats, even including some binary development files (see page 303).
What also sets Star Transit apart is the morphological support for 15
European languages (incl. English, French, German, Italian, Spanish, Czech,
Dutch, Polish, Portuguese, Russian, and Swedish), which means that just by
entering the infinite form in the (powerful) termbase, other morphological
forms are automatically found in the respective languages.
Lastly, the "dual fuzzy" system that Star introduced with Star Transit NXT has
been very innovative and has so far not been implemented by any of its
competitors. The dual fuzzy system is that it not only looks in the source
portion of the reference material but also in the target. This means that if
memoQ
The Hungarian memoQ is a very process-oriented tool that makes the general
workflow user-friendly even for a novice to TEnTs. The first screen that you
see when you open the program gives you all the different options of what you
can do; the creation of a translation project, a translation memory, or a
termbase can be done without having to enter complicated information. The
actual translation interface looks very similar to the veteran Déjà Vu, where
the translatable text is presented in a table format, the source on the left and
target on the right, and matches from termbase and translation memory are
displayed on the side. The import and export of files goes blazingly fast, and
this is true for translation files as well as when you import TMX into a
translation memory. The supported file formats include the whole range of
formats you can wish for, including Transit and Trados Studio project files
(unfortunately without the terminology data).
Also, for HTML, XML, TTX, MS Office, TXT, and some bilingual files, there is an
integrated preview feature right in the translation window.
Across
The underlying database system is an SQL Server system (very powerful but
also very resource-heavy) in which all TM and terminology entries are stored
simultaneously. One differentiator is that Across offers morphological
recognition in its term searches in various European languages, including
English, Danish, Dutch, Finnish, German, Norwegian, Swedish, French,
Italian, Portuguese, and Spanish. Of course, all other languages work with
perfect term recognition.
Alchemy Publisher
Publisher is a little different from most other tools in how it extracts text from
the originating documents, in particular when it comes to FrameMaker and
Word files. Rather than converting the files into an interim format (RTF in the
case of Word and MIF in the case of FrameMaker), it communicates directly
with the application and extracts text on an object basis. This means that
Wordfast Pro
Wordfast Pro, which originally was marketed as the successor of Wordfast 5.5
(today: Wordfast Classic, see page 197), has now instead become a a parallel
version.
The concept of Wordfast Pro is very different than the Classic version. Rather
than using a third-party interface (MS Word) for its translation, it comes with
its own refreshingly simple and well-organized interface. The tool is Java-
based, so it runs on Linux, Mac, and Windows, and all files, independent of
type, can be viewed the same way and in the same interface.
The interim format into which files are converted for translation purposes is a
custom XML format (TXML), and the supported translation file formats include
MS Office formats, HTML, FrameMaker, PDF (through an external but
integrated plugin), Trados TTX, various software development formats, and
InDesign.
Heartsome and Swordfish are both Java-based tools that run on Mac, Linux,
and Windows. In this same spirit of supporting exchange formats, they not
only support the translation of XLIFF files (see page 273), they actually
convert a number of formats (including RTF, Office 2007, FrameMaker, HTML,
OpenOffice, InDesign, and a variety of software development formats) to
XLIFF, provide for their translation within that format, and then convert them
back into their original format.
Only recently have these two tools started to look fundamentally different,
though many of the inner workings are still similar. This is not too surprising
since the main developer who originally worked for Heartsome has now
branched out on his own with Swordfish. The first few differences can be
found in their support of some varying formats (Swordfish, for instance,
supports the new Wordfast Pro format TXML) and in their differing licensing
schemes.
Fluency
Fluency’s developers have collected all kinds of processes and third-party
utilities that they felt would be helpful in the process of translation, integrated
them into their tool and its workflow, and left it up to us whether to use them.
These features include PDF conversion (from and to PDF), optical character
recognition (OCR), an interface for manual transcription of items that can't be
processed with optical character recognition, extensive language-specific
glossaries, direct links to any number of web-based resources, and audio
recognition.
The supported file formats include all the expected formats but also some
surprising ones, including Microsoft Publisher.
I can see some of you cringe when you read "MS Publisher"—yes, I
know, it may have the reputation of a desktop-publishing program for
dummies, but who wants to say no to a well-paying client with Publisher
files to translate? And to my knowledge, Fluency is the only tool on the
market that supports Publisher files.
MadCap Lingo
MadCap is the company that split off from MacroMedia (now Adobe) after
some of MadCap's current owners felt that MacroMedia was treating the help-
authoring product RoboHelp, which it purchased as part of a larger
acquisition, too shabbily. They started their own company and have since
given Adobe a run for its money. (Once they were gone, of course, Adobe
resumed work on RoboHelp.) MadCap's main product is the help-authoring
product Flare.
Early on, the people in charge at MadCap recognized that there was a strong
link between the language and technical writing industries. This has finally
resulted in the release of MadCap Lingo, a translation environment tool that
easily integrates into the authoring/translation environment of Flare but can
also be used as a standalone TEnT for file formats such as MS Word and
PowerPoint (all versions), InDesign, FrameMaker, Trados TTX/SDLXLIFF,
Wordfast TXML, HTML, XML, DITA, and RESX files.
MadCap Lingo is a solid and user-friendly tool that performs very well,
certainly and particularly with Flare projects, but may not have some of the
bells and whistles of its more well-known competitors.
Text United
When importing a file (supported file types presently include HTML, MS Office,
OpenOffice, XML, FrameMaker, and InDesign), a terminology extract of the
file’s content is performed that can serve as the base glossary for the
translation of the project (for terminology extraction, see also page 288).
The makers behind Text United envision this tool as a stepping stone toward a
future in the translation industry where it will be easier for translation buyers
to directly contract with individual translators—through the database of users
and the interface of the tool.
OmegaT
There is a strange and remarkable dichotomy between the technical and easy-
to-use parts in OmegaT. When you start the program, the initial screen has
information on how to get started with OmegaT in five minutes. And they're
not kidding. To use the basic features, you just start using the program and it
works. When it comes to fine-tuning the OmegaT set-up, you might find some
items available in menus and with an easy-to-use graphical user interface
(GUI), but for other features you’ll have to manually set up files and alter
code. One example: to change keyboard shortcuts, you actually have to
create some files that will cause the desired change. If you take your time to
think through it, you'll get it done; if not, you'll end up being frustrated.
The interface is super easy and user-friendly: the actual translation is done in
a non-tabular, horizontal layout. If you have to deal with inline tags (tags
within segments), they are clearly set apart from the translatables. Any panes
with access to terminology, translation memory, machine translation, or
comments can be arranged like you want and even dragged to a second
monitor. And while I wish there were more right-click menus, the actual
menus are well organized and give you the necessary access to available
features.
The range of other directly supported file formats is very impressive and
includes TXT, PROPERTIES, PO, INI, SRT (subtitle), Open Document Formats,
(X)HTML, XLIFF, RESX, LaTex, Wordfast TXML, and Visio files. When I say that
these files are supported directly, it means that there are other file formats
that are supported indirectly through the open-source Rainbow application
(see www.opentag.com/okapi/wiki/index.php?title=Rainbow). The file types
you might want to use with that route include most XML formats, FrameMaker
MIF, bilingual DOC, and Trados TTX files.
OmegaT also includes an interesting project concept: you can have numerous
files of various different formats within a project that automatically open one
after the other as you translate, and any search-and-replace action can be
done simultaneously in all files.
You can find rich and interesting resources about OmegaT—both for novices
and for advanced users—at www.omegat.org.
Figure 160: OmegaT’s translation editor on a Windows computer. Note the squiggly-
underlined, interactive spell-checking, morphologically-aware term recognition
("colleague" for "colleagues") and machine translation suggestion.
Lingotek
When you create a project, you can choose between loading data into a public
vault that can be accessed by anyone, or into a private vault that only you and
other parties that you determine can use.
Figure 161: Lingotek Workbench’s translation interface with term match from the public TM
and machine translation by Google Translate
The makers of Lingotek Workbench have recently been trying to newly brand
their tool as a "crowdsourcing" tool, i.e., a tool that allows simultaneous
access by many who can evaluate each other’s translations.
One area of concern with Lingotek is the fact that internal formatting of
segments is handled rather crudely: the application of formatting is
counter-intuitive and formatting markers are not stored in the vaults.
Unlike what was expected by many, this tool contains virtually no translation/
project management features, but instead a rather well-designed front end
that allows you to do one of two things: you can either upload a file (in HTML,
word processing, text, or a number of software development formats) or you
can specify a URL and the corresponding HTML page will be uploaded.
In the process of setting up your project, you will need to choose whether you
are using a "shared" translation—i.e., whether you are using and contributing
to a large anonymous translation memory—or whether you would like to
upload your own translation memory in TMX format. Here you can also upload
or define a glossary (in a very strictly defined CSV format).
While you are working on your translation (or before, or after), you can also
invite others to participate in your translation/editing efforts by selecting
Share> Invite people. All they need is some kind of Google account.
Figure 162: Google Translator Toolkit with access to machine translation matches
So why does Google offer the service? While it does state in the FAQ section
that it may at some point charge power users, its main objective is not to
generate cash but instead to generate high-quality data. Google's machine
translation engine Google Translate uses a statistical machine translation
engine that relies on good bilingual data—lots of good bilingual data. Any
translation that you add (or improve) contributes to the quality of the machine
translation performed by Google Translate. While there may be nothing wrong
with that in exchange for a free tool, it is also possible to upload existing
(TMX) translation memories. You can specify whether these are for your use
or for your use and that of others, but they are always used by Google. Even
after you "delete" them from the system. Many will not like that.
Wordbee
Since Wordbee is a web-based tool, you don't purchase a license; instead, you
pay a usage fee according to the SaaS—Software as a Service—concept. If
you want to use it in a team environment, you can purchase any number of
floating licenses that you can pass among your team members. A freelance
license gives you space for 250 MB for documents and 250,000 segments in
the TM; if you go beyond that, you can buy more space or download the data
to your desktop.
The list of supported file formats is solid (MS Office, InDesign, OpenOffice,
RTF, XML, HTML-based formats, as well as various software development
formats).
XTM Cloud
One of its most striking features is the Spartan and highly functional interface.
While it is not always completely intuitive, it's well organized once you get the
hang of it.
Every translation file (the supported formats include MS Office, XML, Visio (!),
InDesign, HTML, FrameMaker, PDF [converted internally to text], Trados TTX
files, and others) is internally converted to XLIFF. At any stage of the
translation process it's possible to export it out of the system, process it on
another XLIFF-supporting tool, and bring it back into the XTM Cloud system.
This enables you to continue to work offline in case you have no Internet
connection.
Wordfast Anywhere
You can either paste text from your clipboard into the translation pane or
upload documents in various formats (MS Office, HTML, text, FrameMaker,
InDesign, or the Wordfast Pro format TXML). Once your document is ready for
translation, you can set up whether you want to use your own TM (and
whether you want to keep that to yourself or share it with everyone else), the
large, public VLTM database (see page 198), and/or MT through Google
Translate. (To set up all these settings, select Tools> Open Setup Dialog
Box—the language of the user interface is a little erratic.)
Every user has their own workspace in which they can store up to ten
documents. Once it’s full they can either delete or download the documents
(same with the translation memory and the glossary: they also can be
downloaded at any time). The size limitations clearly exclude Wordfast
Anywhere as your primary tool, but it just might be the tool to use when no
other tool is at hand. Particularly because it’s free.
MemSource
The MS Word translation interface was replaced with a very lightweight XLIFF
editor ("MemSource Editor") in 2011. Just like before, the project and the files
need to be prepped in a browser interface, and the online TM and termbase
are assigned to the translator who then downloads an MXLFF file (which is an
XLIFF file with some specific MemSource extensions that it can be translated
in other tools as well).
The MemSource Editor is foolproof. Once you have the correct information to
log in, there really is nothing you need to know (well, there are a couple of
things, but you can quickly glean those from the menus) and you can start
translating and using the resources that are automatically displayed.
Figure 166: MemSource Editor’s translation interface with data from TM (101), termbase (TD),
machine translation (MT), and subsegments (S)
At the end of 2012, yet another editor was introduced, the MemSource Web
Editor. The completely browser-based editor is offered in tandem with the
desktop-based MemSource Editor and is, as shown in the image below, very
similar in both appearance and functionality.
Figure 168: MemSource’s analysis with numbers on editing distance (see the MT columns)
Comparing TEnTs
There are too many tools out there to make detailed comparisons of every
available feature in every available tool. Instead, I will focus on the main
features that are present in most tools and show how these are handled in
one or two of them. You can use this list of features to evaluate the tools in
making a decision for or against a certain tool.
In the context of most TEnTs, alignment refers to the process of selecting file
pairs in the source and target language that were translated outside of a
translation memory environment, matching all the segments (sentences,
headings, etc.), and creating a translation memory database from those
matches. The resulting translation memory can then be applied to translate
similar or identical texts. Virtually all tools contain alignment modules in some
or all configurations. At first glance, alignment seems like a great process that
anyone starting to use a translation environment tool should do to build up a
nice translation memory database.
And while it’s true that alignment is indeed a helpful process, it’s often
misused. I’ve encountered many situations where new users (both freelance
and corporate) became enamored with the idea of using alignment to
"magically" turn their existing translation materials into one large translation
memory. They spent days or weeks devoting their time to this task, and in the
process they became so frustrated with the use of their new tool that they laid
it aside completely. The reason that alignment is often (and correctly)
perceived as a tedious process is its manual nature. Although each of the
alignment modules in the above-mentioned tools applies well-chosen
parameters to the alignment "suggestions," they all have to be verified, and—
as anyone knows who has done alignment before—often repaired. The
parameters are typically punctuation and paragraph markers, repetitions, and
non-linguistic matches such as numbers and abbreviations. This can go a long
way toward making correct matches, but it often requires user intervention.
Typical cases where manual changes are required are differences in sentence
delimitation (one sentence in the source becomes several in the target or the
other way around), shifts in the order of segments, different use and/or
placement of footnotes, and index markers.
Figure 169: Alignment view in Déjà Vu (note that the program split the first sentence
incorrectly in the Spanish target)
Trados Studio 2014 has completely thrown out SDL’s traditional WinAlign
alignment tool and now offers an integrated alignment tool that offers an
option called "alignment quality value." This is a setting that allows the user to
adjust the confidence level of the match. For example, if you have set a high
quality level and there is a huge difference in the number of words in the
source and target segments or there are numbers in the source but not in the
target, the translation units would be rejected.
Figure 170: Trados 2014 alignment tool with adjustable quality value
The results are better than with the old WinAlign tool, but it would be helpful
to have individually adjustable categories to make up the confidence setting.
With all these difficulties, why would alignment still be a helpful process?
Alignment can be a very powerful tool if you have specific sets of already-
translated documents that correspond closely to new documents that now
have to be translated. The amount of time you can save and the level of
consistency and quality you can achieve by aligning the existing documents
and using that as the basis for your translation can be immense, and there's
simply no reason not to go that route. But for other documents, unless you
can hire someone else to do mass alignment of existing materials (someone
with the odd combination of being both cheap and well-qualified . . .), I would
strongly advise you to build up your translation memory database by simply
performing translation in the tool of your choice and adding material to your
translation memory segment by segment.
Furthermore, with AlignFactory you can also select thousands of file pairs
(including PDF files), have them matched up (they have to follow certain
naming conventions such as a language identifier), and then have them
aligned in one big swoosh. And it really is one big swoosh: the speed of the
alignment is mind-boggling. In fact, it’s so fast that I have repeatedly thought
that something had gone wrong only to find that it had already successfully
completed the alignment. While it’s not perfect, it certainly has brought
alignment to a different level.
Figure 172: AlignFactory’s alignment results—note the correct alignment of the first segment
The supported file formats include TXT, DOC(X), RTF, HTML, TMX, and (in a
limited fashion) PDF. Or you can download EU documents or other online
documents for alignment purposes directly from within the tool.
Fairly recent developments with the way translation memories are utilized go
beyond the familiar perfect-fuzzy match scheme of complete translation units.
Particularly interesting in this context are the so-called "subsegment
matching," predictive typing, and the combination of TM resources with
terminology and machine-translated data.
Subsegment Matching
Figure 174: memoQ’s subsegment matching feature (LSC or "Longest Substring Concordance"
in memoQ lingo)
Predictive Typing
The same principle is used in the TEnTs that offer this feature (among them
Trados Studio, memoQ, Déjà Vu, Across, Star Transit, and Wordfast Classic),
only that they in some way or the other use the content of the translation
memory to furnish these suggestions. Depending on your typing habits, this
can be a huge time saver, and it increases consistency significantly.
Terminology Handling
Only a fraction of the translators who use a translation environment tool today
are using the terminology component that all TEnTs offer. That’s unfortunate
because they are all missing out on one of the most powerful feature of
TEnTs.
Obviously, if things were that simple, there would be no need for translators in
the first place—machine translation would have long taken over our
profession!
The terminology database is the place where you can invest effort into
defining your words and phrases grammatically, contextually, or even by
contrast. If this is very helpful for you as a single translator, how much more
would it be in a virtual translators’ workgroup! Of course, none of this is news
to anyone: any good dictionary offers the same concept. What makes these
"dictionaries" (if you will) much more exciting is that you can build them up
the way you want them. Furthermore, they are "living dictionaries" that
present their findings for each of the segments you are currently translating
without you having to do anything (if you have previously given them the data
that they now share with you).
In addition, some applications not only display this data to you from the
terminology database but even try to assemble it for you—i.e., piece it
together—which should convince you that it makes sense to spend some time
building up these databases. If you are not a translator of extremely repetitive
materials, this might also convince you that these tools may have a definite
benefit even for you (see page 194).
Transit is probably the tool that spent the most early effort developing a
sophisticated terminology tool, TermStar. In the screenshot below you can see
some good examples of what kind of information can be entered into a
terminology database: client, date, definition(s), homonyms, and of course
translation.
Figure 177: Transit translation project with dictionary access (lower right pane)
The advantages of the new version are that it is based on standard XML rather
than a proprietary database format; it exports into XML, HTML, and RTF; term
entry is made less cumbersome (you can now highlight the source term and
only have to type the target term); and remote applications of the program
have become easier.
MultiTerm provides the found term as a reference to aid in fixing a found fuzzy
match or translating from scratch within Trados 2007.
Figure 179: View of the Trados Workbench translation memory with a fuzzy match and
reference from MultiTerm (right pane)
For Trados Studio, this procedure has been changed. Here the terminology
matches are displayed in a separate Term Recognition pane (upper right) as
well as in an automatically displayed match proposition which can be entered
by pressing the ENTER key.
Figure 180: View of Trados Studio with an automatically offered terminology match
Of course, there are also standalone terminology tools. You can find more
information on those on page 293.
Work Environment
Since the work environment was already used as the main criterion to
categorize the tools (see page 195), there might not be too much to add, but
the following might be helpful anyway.
Within those general frameworks that were dealt with in the Categories
section, there are some important differences as to how the translated text is
displayed.
Of course, this only works for documents that were directly compatible with
MS Word. And while the majority of translators today work in a much larger
variety of formats than "just" Word documents, some do primarily work in
that format, and this might be a good solution for them.
Different tool vendors have answered this differently, but what seems to be an
emerging trend in the last few years is a semi-WYSIWYG approach. While
some of the more common formatting elements are displayed (such as bold,
italics, or underlining), others are not.
Figure 182: memoQ’s semi-WYSIWYG interface (note that the bold and italic formatting is
preserved but formatting tags are used for the small caps)
Déjà Vu and Wordfast Pro, among the last to offer the WYSIWYG
element have already announced that their new versions—Déjà Vu X3 or
Wordfast Pro 4—will also follow the semi-WYSIWYG approach.
cons of machine translation in its many shapes and forms, I would at least like
to look at the different ways that machine translation is integrated in today’s
translation environment tools and what the usefulness of that integration
might be.
First I would like quote Jaap van der Meer (in MultiLingual 71, 2005), a strong
proponent of machine translation:
Disdain on the side of the professional translators for the hilarious and stupid MT
mistakes gave birth to a new variant of MT called translation memory (TM). TM
started off as a lower-level feature of commercial MT systems (...). But the
success of TM came with dedicated products such as IBM TM/2 and Trados. The
marketing message was tuned in to what the professional translation industry
wanted to hear: "Forget about MT; it doesn't work well. Instead, use our TM
product because it leaves you in full control of the process."
Mind you, I don’t completely agree with everything van der Meer states. I
particularly disagree with his assertion "that post-editing fuzzy matches from
TM databases is, in fact, not different from post-editing fuzzy matches from
any other MT system." There is in fact a fundamental difference between the
work of post-editing MT translations and fuzzy matches, and often this is not
fairly represented in the MT community. Provided that your translation
memory is in good shape, editing a fuzzy match means altering an inherently
correct segment (correct as a translation for the earlier source segment) to
match your current source segment. Typically this involves changing a couple
of terms, which can be done easily. This is not necessarily so with MT, though,
which is not inherently correct. It can be, but it does not have to be. If you
work in my language combination (English>German), you will quickly find
that more often than not there are fundamental changes you will need to
make to bring the translation to the required quality level.
Still, van der Meer points out some uneasy truths: translation memory is in
fact a lower-level feature of machine translation; it is used as such within
machine translation programs to the present day; and, yes, we have certainly
been influenced by the marketing van der Meer points to.
Here are the different offerings of most TEnTs (as of January 2014; please
note that to actually use most of these MT tools you'll need a license key).
You can find information on these different MT engines and the language
combinations they cover on their respective websites, but two might be worth
highlighting because of their different nature:
Are the many integrated MT engines helpful and are they used by professional
translators? I will leave the answer to the first question up to your preferences
(and language combination, and kinds of translation you do, and the many
likes and dislikes that you might have about this kind of technology), but the
answer to the second question is, yes, more and more translators are using
MT as one of many resources.
Some are using it if no quality TM matches are found (the tools can typically
be adjusted so that machine translation is pursued only if no match of, say,
75% or higher can be found), others are using MT as an extended dictionary
for highly specialized terms, and yet others are using it as a source for a
variety of suggestions.
Consider this example from Wordfast Classic (with machine translations from
Google Translate, Microsoft Translator, Linguatec, Systran, and Trident MT—
the last three through itranslate4.eu).
There is no need to argue about how "good" these matches are, but most of
them contain some material that in some kind of combination might be useful
in the actual and final translation. You as a translator will have to decide what
kind of role this information plays for you. Does it help or hinder? Is it
different, for instance, than having a lot of matches from a general TM shown?
The answer to that will most certainly depend on your language combination
(some language combinations are much more suited to a first machine
translation draft than others) as well as your project type or subject matter. It
might even be different between different projects.
Beyond that, virtually all translation environment tools offer quality assurance
features such as spell-checks or checks for formatting integrity. In fact, tool
vendors have recognized only rather recently that there is demand for more
far-reaching quality assurance features.
Figure 184: Setting which terminology database is to be used for Wordfast’s terminology check
It’s important to realize that this feature is not equally useful in all
languages. Terminology checks in languages with heavy conjugation or
declination, or agglutinative languages such as Turkish or Finnish, will
typically find a lot of "translation errors" that are really just different
forms of the correct term. A strategy to counter that is to enter various
term pairs to cover the different word forms.
SDLX offered its quality assurance checks as the major improvement when it
released its version 2005. At that point it was probably the most
comprehensive solution.
However, Trados (whose owner also owns SDLX) versions 7.1 and above
included a larger set of QA features than any of its competitors.
Figure 186: The QA Checker module in Trados: Segment Verification (Trados Studio -
above) and Punctuation (Trados 2007 - below)
One particularly helpful aspect of the Trados QA Checker is the fact that you
can load and save a profile (under QA Check Profiles), enabling all members
of one translation team to use the same QA procedures.
In a sense it is not a surprise that Trados offers such encompassing QA
features. For a number of years, several tool vendors have been offering a
variety of quality assurance tools that specifically provide quality assurance
for Trados files. So when Trados got ready to offer it themselves, they could
just pick and choose the most helpful features.
For even more comprehensive standalone quality assurance tools, see page
279.
Collaboration Features
When you search for the term "collaboration" in relation to translation
environment tools, you will quickly realize that there are a number of different
levels of collaboration and different definitions of what this entails.
The first level, which is also the only one that all tools offer in some form, is
collaboration through exchange formats. There are a number of existing
exchange formats, most importantly TMX for the exchange of translation
memories, TBX for the exchange of termbases, and XLIFF for the exchange of
translation files. (Another important one, Linport, is about to be released for
the exchange of translation project packages.)
TMX, TBX, and XLIFF are all standards that are based on the same underlying
standard, XML, and that’s not where the similarities end. All of them play a
very important role in the exchange of their respective formats, and all of
them have clear limitations to how seamlessly the exchanges take place.
• The two major problems that TMX has are a) the different ways in which
the so-called inline tags (tags that contain non-textual information within a
segment) are stored in the translation memories of the originating
application and b) the different ways different tools segment texts, leading
to differences in the way what is considered to be a segment in the
translation memory will end up as a match.
• TBX, the standard for exchanging termbase data, has to be able to capture
everything that is contained in a termbase. Unlike translation memories,
termbases can be very complex with literally hundreds of different kinds of
fields that describe the terminology data or set it into relation with each
other (such as term ABC is a synonym of term XYZ). Naturally the
standard to describe that complex data also has to be very complex, which
made the adoption of the standard very slow and the actual process of
exchanging complex termbases very manual. TBX is not as widely
supported as TMX, and many tools that "support" it don’t allow for all the
different fields to be imported, partly because their own termbase
structure does not support many of the different fields.
properties and attributes that are getting lost in the transfer process. While
this typically does not mean that you couldn’t translate an XLIFF file from
one tool in another, it’s highly recommended that you do a trial run to
verify that everything works right (many tools offer a feature called
pseudo-translate just for this purpose).
• Lastly there are the package standards. Package standards take care of
the complete translation project, including translation files, resources (TM,
termbase, and reference material) as well as any kind of meta information
related to the project (such as instructions, etc.) and combine them in one
zipped up file. The user can open the file in a supporting translation
environment tool, which will utilize the individual parts by placing them
appropriately on the user’s computer and give automatic access to them in
the translation process. Once the translation is done, the package file will
be sent back to the requestor and will contain all necessary assets.
The last standard that still needs to be developed is one that allows for
an exchange during server-based processes where either the translation
data or the resources are placed in an online location and are
continuously queried during the translation process. These processes so
far are tool-dependent
Real-Time Collaboration
Considering all the problems that are encountered with the exchange formats,
the real-time sharing of TM and termbase resources or even the translation
file itself would be clearly advantageous.
There are only a small handful of translation environment tools that don’t
support collaboration in these kinds of workgroups. These include tools like
AnyMem, Similis, Snowball, and Publisher, which are mostly and specifically
geared toward the freelance translator.
The majority of the other tools come in a multi-tiered structure: the least
expensive—in some cases, free—version is geared toward the freelance
translator and supports no workgroups; the higher-priced versions can be
used to organize and administer workgroups with real-time collaborations.
The collaborators are typically equipped with the freelance editions of the
respective tool.
There are only a small handful of tools with which you can share resources in
real time with a non-corporate edition. These include Wordfast Classic/
Anywhere, OmegaT, Lingotek, Google Translator Toolkit, Wordbee, and
MemSource.
As cloud-based solutions, this feature is very apparent for the last four, but it
is less apparent in Wordfast Classic and OmegaT.
In Wordfast Classic you can connect to your Wordfast Anywhere TMs and
glossaries, which in turn can be shared with others in real time.
You can find a demo on this feature and its setup at www.wordfast.net/wiki/
Using_a_WFA_TM_in_WFC.
At this point, a good number of tools also support PDF files through an
internal conversion process (see page 386), and you will have to look closely
at the version of your desktop publishing tool to see whether it’s supported by
your translation environment tool (for a comprehensive chart, see page 326).
• XML files with embedded HTML (memoQ, Déjà Vu, Trados Studio, see page
344)
• CHM help systems without the need to decompile (Alchemy Publisher, see
page 358).
You can find information about other exotic formats and ways to process them
with translation environment tools under Translating Complex File Formats.
I would advise you not to actually look so much at the tools themselves but
instead see what your particular environment is like.
These should be your first criteria: Who are you clients (or, if you’re just
starting out, who do you hope your clients will be), what tools are they using
and how do they use them?
• If they use a TEnT and send you pre-processed bilingual files in Word or as
XLIFF or Trados TTX files, you can work with the majority of tools, no
matter whether they match your client’s tool or not.
• If they send you the projects in a TEnT-specific package format (a file that
contains all the resources and the translation file you need for the
completion of the project), it’s possible that you can use other tools than
the client is using, but you’ll need to investigate a little more to know for
sure that that kind of exchange works (see, for instance, page 278.)
• However, if your clients are using a process where the translation memory
and terminology data (and possibly the translation file itself as well) is
located online, you will have to use the tool that the clients are using (see
the note on page 275).
Next, look at what colleagues you are often work with are using. It will serve
you well to use the same tool—both for the sake of seamless cooperation as
well as some friendly support. And speaking of support, make sure that there
is an overall good support system in place (see page 397).
Also, you should inform yourself about training opportunities. For any
translation environment tool, especially if you have never used one before,
you should consider investing some kind of training.
And lastly, look at the tool itself. Start with looking at the file formats you are
translating. Does the tool you are looking at support them all? If it does not
support all the formats, would it be OK to not use the tool for some projects?
If you are not a Windows user, you will have to make sure that your tool runs
on the platform of your choice.
One thing should really not become a major part of the decision-making
process: how much the tool costs. Instead, look at the return-on-investment.
Any tool that you invest in and can’t make good on the purchase price within a
few months is a failed investment, no matter the original price. Plus, the initial
purchasing cost most likely is the smaller portion of your investment. Training
will be the larger.
As mentioned above, Xbench (see www.xbench.net), the tool that was already
mentioned in connection with terminology search (see page 295),
distinguishes itself by the large range of bilingual file formats it supports and
has become the preferred tool for many in the field of QA tools.
Aside from these desktop-based tools, there are also cloud-based quality
assurance tools for which you upload all or part of your content to an online
location.
These include the Russian tool multiQA (see multiqa.com) and the Japanese/
Czech tool CrossCheck (see www.idioma.com/smart-language-solutions).
While both of these tools perform the typical variety of QA checks (albeit in a
more limited fashion and for a more limited number of formats), they really
distinguish themselves with terminology checks for which—unlike their
competitors—they use specifically developed morphological engines (multiQA
for English, Russian, German, Ukrainian, Kazakh, Chinese, Spanish, Polish,
Slovak, Czech, and Norwegian and CrossCheck for essentially all Western
European languages as well as Czech, Hungarian, Romanian, Russian, and
Turkish). This means that these tools allow you to check your translation
against a terminology database or glossary and won’t give you a "false
positive" when you have the singular nominative form in the termbase and a
plural form in the translation.
manual references, and many others. Much like Acrocheck, the intention is to
create well-formed documents before the translation even starts, thus aiming
at a better return on translation memory matches and/or better entry of data
into the translation memory.
and the Society for the Promotion of Applied Information Sciences at the
Saarland University (see www.congree.com); SDL’s Global Authoring
Management System (see www.sdl.com/products/gams); and Star
MindReader (see www.star-group.net/ENU/mindreader/mindreader.html).
Though these do not strictly fall into the category of quality assurance, this is
a particularly exciting new family of tools. Tools that allow authoring on the
basis of a translation memory not only extend the use of the translation
memory—it is obvious that you will have a huge number of matches in the
translation portion of a project if you adjust your writing to the source part of
the translation memory in the first place—but it also offers a whole new world
of opportunities to language providers! All of a sudden, authoring may
become a much easier new service portfolio item for individuals or companies
who have so far specialized in translation only.
Figure 194: Open TMX file in Olifant with the available commands in the View menu
Heartsome also offers TMX Editor, an editor for the translation memory
exchange format. It’s Java-based so it runs on all platforms and it’s
surprisingly fast.
As you can see in the following screenshot, it also offers a quality assurance
filter that you apply to your TMX file. Once problematic translation units are
found, you can either batch change or delete them or process them
individually. Other features include the modification and/or adding of
metadata (data about the translation unit), changing of the code page,
merging or splitting TMX files, or exporting TMX files into a great number of
other formats, including a number of text formats and Word or Excel formats.
Apsic Xbench (see page 295) also allows for the conversion of translation
memory formats or other database exchange formats, and so do some of the
tools that are provided along with Swordfish (see page 220).
Terminology Mining
Terminology mining programs offer the possibility of extracting terminology
and building up terminology databases or glossaries by taking existing pairs of
source and target documents or bilingual translation memories, analyzing
them, and presenting you with a proposed translated terminology list. Once
this list is generated, it can be used as either a primary glossary for a project
(or to send to the client), or as a common glossary that can be shared among
multiple translators working on this project.
The most powerful application in the field of term extraction used to be the
Xerox Terminology Suite (XTS), which was designed for the deep pockets of
corporate users and was very powerful because it was based on preconfigured
linguistic data in various languages. Today the suite is owned by TEMIS (see
www.temis.com), where development has virtually come to a halt.
However, the translation environment tool Similis (see page 200) has
integrated the XTS engine and therefore comes with a very high-level
linguistic "knowledge" in seven EU languages (English, Dutch, German,
Spanish, Italian, Portuguese, and French). Similis is able to apply a
combination of linguistic and statistical rules to a number of processes,
including automatic extraction of terms and phrases from translation memory
content with extremely high accuracy—but unfortunately only in a handful of
languages.
Theoretically, all languages are supported with the tool; however, practically
speaking there are different tiers of language support. In general,
SynchroTerm relies on mathematical calculations to extract terminology pairs.
For a great number of Western languages it also uses long lists of stop words
to filter those out automatically, and for English and French it also makes use
Once you've registered, you can upload one or several files in various formats
(PDF, DOC(X), XLS(X), PPTX, RTF, TXT, XLIFF, XML, or HTML), have
terminology extracted from the file(s), apply content within existing
terminology resources to those terms, select from the suggested translations
and/or translate the terms, and then export it so that you can use it within
your terminology database or glossary.
This tool is particularly interesting because of the tools that support the
extraction process. These include tools for part-of-speech tagging,
lemmatizers, morpho-syntactic patterns, statistical analysis and—for English
and Latvian—a tool to normalize terms, which brings terms into their
canonical forms (typically nominative singular or infinitive).
Once that is done, the extracted list of terms will be run against a number of
(again, optional) resources in the following order: 1. your own personal
resources that you might have collected on the site; 2. other users'
terminology; 3. the EuroTermBank; 4. the EU's inter-institutional terminology
database IATE; 5. the TAUS corpus; and 6. the TaaS statistical database
(SDB) that consists of aligned web data. Once these databases have been
queried for translations, they will be shown as suggestions from which you
can choose by just clicking on them and/or you can enter your own
translation.
Of course, one of the ideas behind this project is to make it possible to share
terminology data. At the outset of each project you can enter a whole lot of
optional data, but you will need to make a decision on the language
combination, the domain of your text, and whether you want to share the
data with other users. The shared data will not include the complete texts that
you upload but only the term pairs that you will end up with in your termbases
(and only on an individual term pair level rather than complete lists of term
pairs). The shared data will also be used for other purposes, including
machine translation. Both Tilde and TAUS have a strong interest in machine
translation (and so does the EU as the funder of this project), and high-quality
termbases are naturally helpful for machine translation.
One of the first standalone terminology tools was developed by Alan Melby in
1982 and made commercially available in 1987. MTX (see
www.linguatech.com/mtx.htm) enabled translators to compile their own
glossaries as a separate task or while working in documents. It provided
macros for Word and WordPerfect so that, with just the help of a keyboard
shortcut, a search for an entry in the termbases could be launched. The
exchange format of MTX is called MicroMATER (this was later developed into
MARTIF, which in turn provides the basis for today’s exchange standard TBX—
see page 273).
For translators who feel that the jump to using a TEnT is too big or who are
unhappy with the terminology management in their existing TEnT, this might
be a good and inexpensive solution. Like its competitors, it allows you to
perform a search without ever leaving the application you are working in;
simply hit a keyboard shortcut, which then calls up the application with the
search results window.
It then indexes them and gives you near-instantaneous access to the content
of these files. It’s a very powerful tool that really stands alone in its class; the
only drawback is that it requires a rather large amount of computer resources
to run.
and using these to do terminology searches. The supported file formats for
the alignment include Word, WordPerfect, Excel, PowerPoint, InDesign, HTML,
XML, and PDF files, and you can store the aligned results as bilingual HTML or
XML files.
Once you have created your bilingual files, you can search in a virtually
unlimited number of aligned file pairs for any term or phrase and use a great
number of operators in your searches (such as wildcards or for fuzzy
matches). You can do this from within the LogiTerm interface . . .
All this information doesn’t help much if you don’t know what software
localization tools actually do. So, here goes:
Ten or fifteen years ago (and in many cases even today), software was
translated by
• finding out which files contained translatables (in the case of most Win32
applications, the translatable strings were typically located in binary EXE or
DLL files, i.e., files that cannot be opened with a text editor),
• combining (=compiling) these files back into the original EXE or DLL files,
• testing these files extensively for cut-off text due to text expansion or any
other errors that may have been introduced, and
• starting the process from scratch if any change in text occurred during the
development cycle or any other editing had to be done.
• eliminated the need for the various compilation procedures and at the
same time streamlined updates to the software (like for a new release or
bug fix), because the old glossaries could be applied and only new text
needed to be translated
While Microsoft decided to keep its tool, LocStudio, internal, Corel decided to
market its tool, Catalyst, to the rest of the translation and software
development community. Catalyst is now the market leader in a field with
numerous other players, many of which have remarkably similar feature sets.
All of the tools come in several editions that have radically different price tags,
and many of the above-listed abilities are sold as separate plug-ins. Typically
there is a translator edition that excludes some of the more development-
oriented functionality, and a developer or localizer edition that contains all the
functionality. Passolo and Catalyst also come in editions that allow the
developer to create files which can be worked on in a freely downloadable
edition for the translator.
To come back to our original question, when these tools were first released,
software developers across the board became nervous. They were afraid that
a new development-oriented tool would likely cause problems—as most of us
know, developers feel quite protective of their "baby," the software. At this
point, however, it’s clear that these fears are completely unwarranted. Unless
software does not follow any of the supported development standards it’s not
only safe to use a software localization tool, it's silly not to—and a great waste
of money, time, and energy to boot.
Management Tools
There’s a problem most translators face with project management tools: when
business is finally good enough to justify implementing a management tool
(both because of the purchase price and the volume of business that needs to
be managed), their management workflows are so entrenched that it’s hard to
change. And old habits die hard. . . .
Some tools that have been mentioned in previous sections can take care of
certain aspects of your translation work, including:
• Outlook (or any other applicable email and scheduling application) for
managing schedules, due dates, and reminders
My favorite tool as a project manager (so many years ago. . .) was Microsoft
Project. This program is impressive for its ability to track projects very
effectively in an almost unlimited number of ways and save the results in a
great variety of formats, including HTML, that can be shared with anyone.
Awhile back I purchased a recent copy of Project for our small company and
never really used it; for our small business it seemed like overkill to use such
a "heavy" application to track projects.
I chalked that up to one of my few software investments that didn’t pay off.
But there is another group of tools that have come of age: accounting and
project management tools that are specifically created for the translation
industry. The concept of these tools is to automate and organize repetitive
tasks that are associated with your translation projects, including
• generating quotes
• scheduling tasks
• vendor management
You won't be surprised to hear that all this makes for a number of different
categories of tools. The first category is the kind of tool that gears toward the
management of jobs, invoicing, and vendors for agencies. These are the tools
that I am aware of which do this:
If you look at the different websites of these vendors, you will quickly
recognize different levels of professionalism, price, and approach. For
instance, ]project-open[ is an open-source tool that allows a great deal of
customizability; it also offers a number of additional paid modules that you
can but don't have to integrate. Worx and Plunet are completely hosted
online, and T.O.M. is really more geared toward smaller companies or
freelancers.
And then, of course, there are also tools made for freelance translators. Some
tools, such as Customer Pro-File (see www.linguabase.com/cpf.asp) or the
above-mentioned T.O.M., are designed for smaller companies or freelancers.
However, the tool that is probably the leading contender in this group is the
little sister of Projetex, Translation Office 3000 (see www.to3000.com), a no-
nonsense database-based solution with little footprint that can significantly
minimize your accounting time as a freelancer.
Figure 208: The invoice window in Translation Office 3000 with easy access to all other
modules
I’m pleased to admit that I have finally given up my old entrenched ways and
adopted Translation Office 3000 for my management and accounting. I’ve
thrown out my general accounting software (Quicken), adjusted the look of
the customizable invoice templates in Translation Office to the look of my old
ones, and figured out that, after a bit of setup, I’m much faster this way.
You’ll have to try it for yourself to see whether the same is true for you.
While I have dealt with office formats in earlier sections (see Office Suites on
page 149), in this section I have attempted to categorize the most commonly
required more advanced file formats. You will find descriptions of the
programs for which these are written, how to distinguish between the
translatable vs. untranslatable parts, and how these formats are supported by
computer-assisted translation tools.
• Graphic formats (pixel-based: JPG, GIF, BMP, TIFF, etc.; and vector-based:
EPS, AI, etc.)
• Software development formats (binary formats: DLL, EXE, OCX, etc.; and
text-based formats: RC, PROPERTIES, RESX, etc.)
• Database-based data
This doesn’t sound good, but here are the brighter aspects: Yes, they are
expensive, but you may not even need to have them installed on your
computer when you translate them. They are very difficult to learn on a real
expert level—after all, graphic designers, desktop publishers, and prepress
specialists are well-paid professionals—but as translators we only need to
translate the files, not design them. And, yes, there are obstacles, but,
fortunately, there are workarounds as well.
Generally, DTP programs can be categorized into two groups: those created
for design-oriented publications and those intended for content-oriented
publications.
While the content-heavy applications also offer good graphics and prepress
management (albeit not as advanced as the design-oriented programs), their
main focus is on the processing of text, which shows in the advanced TOC
(Table of Contents) and index generation, cross-references, page break
management (widow and orphan rules), an independent character and
paragraph setup, and the ability to output documents in a huge variety of
formats. The latter is increasingly done through a tight integration into XML
(see page 150).
The very concept of these programs is that there will be as much automation
in the layout as possible. This is achieved, for instance, through fairly
sophisticated widow and orphan rules so that there will only be a small
amount of additional pagination.
In general, these programs are very well suited for translation. There is no
problem with non-Western languages even in Western versions of the system
(provided that your operating system supports it), and the latest versions of
FrameMaker now also fully support Unicode. The size of the files tends to be
relatively small because graphics are usually linked and not inserted, and all
of these programs are exceptional in the ways they publish and re-publish
text in a great variety of formats, including HTML, XML, PDF, and RTF.
If the FM files are displayed with an icon in the form of a question mark,
you need to delete them from the book with the appropriate command
from the menu and then re-add them from within the Add menu. Once
the files are added, you can easily change the order of the files by simply
dragging them within the BOOK interface.
You will need to save the compiled FM format within FrameMaker by selecting
File> Save as and selecting the text-based MIF format. To avoid the
individual opening and saving of each file, you can use the free FM2MIF tool
(see www.dtptools.com/product.asp?id=fmfm) to do this as a batch process
for a whole book. (By the way, it’s totally okay to ask your client to do this for
you if you do not have FrameMaker on your computer.)
Once all your files are preprocessed, they are supported in most translation
environment tools whose representatives will tell you that their FrameMaker
processing is one of their strongest features—which only goes to show that
FrameMaker is a very translator-friendly format.
There are slight differences in the way that the different tools process the MIF
files. While some tools (Déjà Vu, Transit, Across, memoQ, Wordfast Pro/
Anywhere, Trados Studio, MadCap Lingo, Translation Workspace, MemSource,
Heartsome, Swordfish, Text United, XTM) process the MIF files directly like
any other file type, in Trados 2007 and earlier you need to convert the MIF
files into RTF files (so-called "STF" files) with a separate program that is part
of the Trados suite of tools, the so-called S-Tagger for FrameMaker (usually
located under Start> Programs> Trados XX> Filters), before you can
translate them in either Word or TagEditor. The process of converting the files
is slightly confusing if you do it for the first time, but the principle is this:
You will need to create two different directories, one of which will contain a set
of files that will only serve as reference files so that the FrameMaker MIF files
can be reassembled once they are translated in RTF format. The other will
contain the set of files that are actually prepared for translation. Keep that in
mind in both conversion processes (into RTF and back into MIF) so that you
select the correct directories either way.
Figure 209: Trados S-Tagger for FrameMaker with tabs to convert files and verify tags
Most other tools process the MIF files directly and translate all the background
information individually for each file.
One exception to this is Alchemy Publisher, which allows the direct translation
of FrameMaker FM files. Clearly this is a tremendous time saver, but there is
one striking disadvantage. Since Publisher uses FrameMaker in the
background to process the files, you must have FrameMaker installed on the
machine on which you translate FM files. So, if you already have FrameMaker,
Publisher might be a good option. If not, it’s important to consider the
additional cost.
In these formats, each text block, called a story, is saved in individual text
boxes from which the text has to be manually exported into a tagged text
format and re-imported if you want to process them in a translation
environment program. While this is theoretically not an issue, it is very (!)
time-consuming when you have to do this for tens or even hundreds of stories
in one document.
Fortunately, there are some applications available for these programs that
allow for the batch export and import of these stories into one text file per
original file (CopyFlow at www.napsys.com/cflow.html for Quark and
StoryCollector as part of Trados [version 2007 and before] for earlier versions
of Quark, PageMaker, and InDesign).
An issue with any of these programs is that there is often a fair amount of
post-translation layout due to text expansion, etc. The text boxes in which the
stories are located do not automatically expand, and often have to be
manually resized once the translation is finished.
It’s one thing to consider purchasing (and learning!) any or all of these tools,
but a consideration that is just as important is the price you will have to ask
for to translate a document in InDesign, PageMaker, or Quark in comparison
to a document in Word. Are your clients able and willing to reimburse you for
the larger amount of time that you are spending with these files?
This is not where the problems stop, though. Especially PageMaker (and
QuarkXPress up to version 6.5) is still very "last century" when it comes to
processing multilingual text. Though Unicode (see page 6) is a widely
accepted standard that makes it easy to mix and match different writing
systems on web pages and all kinds of other documents, some DTP programs
are not up to par on this. Even though Quark does now support Unicode with
its latest versions, PageMaker most likely will never do that because the folks
at Adobe have a better choice when it comes to processing Unicode: InDesign.
InDesign
After a fairly unsuccessful version 1, InDesign really gained traction beginning
with version 2. Presently you will encounter InDesign files that are created in
versions 2, and CS (3) through CS7 (9). To translate efficiently in InDesign
you will need a program that exports all the stories (the above-mentioned text
boxes) into one large file which can be processed in a translation environment
tool. (Of course, it is possible to translate directly within InDesign, but the
emphasis was on "efficient.")
Trados (versions 2007 and before; InDesign 2 and CS are not supported by
Trados Studio) offers little plug-ins as part of all its versions of the Workbench
product that support InDesign versions 2 or CS (the plug-ins are stored under
C:\Program Files\SDL International\Txxxx_xx\FI\IND—follow the
instructions in the help file on how to install the plug-ins). Once you have
installed the plug-in and opened the InDesign file, you will see a new Trados
menu with all the necessary commands to export and re-import your file.
Figure 212: View of Trados 7-exported InDesign text file in a text editor
As you can see in the above illustration, the text file is not just a "normal" text
file; instead, it is a "tagged" text file where only the smallest part is actually
translatable (essentially everything that in not enclosed by <tag markers>)
and all the other data stores information about details such as formatting,
placement, etc. While it theoretically would be possible to translate this within
Microsoft Word or a text editor, it would be foolish to even try—chances are
that you would break the code or overlook text.
Trados and Déjà Vu recognize these files as InDesign files, protect all coding
information, and display only translatable text.
With the release of InDesign CS2, the accessibility of InDesign files became
much more feasible for translation environment tools because it was now
possible to save files into the XML-based INX format. This format is supported
by the vast majority of TEnTs. Note that you will have to have a copy of
InDesign on your computer to save the file as an INX file (or you can ask the
client to do it for you).
It is also advisable to check what version of CS2 through CS7 your tool
officially supports since there are fairly major differences between the XML
structure of the different versions of InDesign. Since InDesign has become the
quasi-standard desktop-publishing format, you should be able to expect your
TEnT vendor to update quickly to the latest format of InDesign.
With version CS4, InDesign introduced the ability to export InDesign Markup
Language (IDML) files. These are a zip-compressed set of XML files where
each XML file represents a "story" (text box). While it’s possible to translate
these files without any specialized filter (you can export the IDML file out of
the original InDesign INDD file with File> Export, rename the IDML
extension to ZIP, unzip the file, locate the XML files that contain the story
content—the translatable text—and import or open them with your TEnT), the
latest version of most translation environment tools now supports the IDML
format directly, and many tools, including Trados Studio, now support only the
IDML format for InDesign.
For this workaround you will need to realize that the MQXLZ format is a zipped
(compressed) format which contains an XLIFF file (with the extension MQXLF)
and a "skeleton" file (which contains all the external data, such as images). To
retrieve the XLIFF, change the extension of the MQXLZ to ZIP, right-click on
the file and select Open with> Windows (File) Explorer.
Don’t use a compression utility because that might cause problems in the
back conversion to InDesign.
Once you see the MQXLIFF file, copy it to an external location and rename it
to XLF or XLIFF. Now you can process it in any other tool. Once you’re finished
with the translation, replace the extension of the XLIFF file with MQXLIFF,
open the ZIP file again with Windows/File Explorer, and replace the old
MQXLIFF file with the newly translated one. Once that is done, close the ZIP
file, rename its extension to MQXLZ, and upload it to the Language Terminal
again to have it converted back to an InDesign INDD file. Once the Terminal is
done with the conversion, you can download a ZIP file that contains the INDD
file alongside a PDF with a preview of the translated file.
Make sure that you first run a test with a pseudo-translated file (a file
where the characters are replaced with "dummy" characters for testing
purposes).
PageMaker
To translate PageMaker files (an increasingly rare occurrence because Adobe
is trying to push InDesign over PageMaker) with a computer-assisted
translation tool, you can either use Star Transit with a separate plug-in that
supports PageMaker 6-7, or you can use a plug-in that comes with the Trados
product (only version 2007 and below; Trados Studio is not supporting
PageMaker anymore) called Story Collector for PageMaker, which supports
PageMaker versions 6.5 and 7.
To install the Trados plug-in, open the help file under C:\Program Files\SDL
International\Txxxx_xx\FI\PM for further instruction. Once the plug-in is
installed, open the PageMaker file in PageMaker and you'll find the command
Trados Story Collector under Utilities> Plug-ins.
Export all the stories into one large PageMaker-specific text file, save the
original PageMaker file (important!), and translate the exported text file with
TagEditor or any other application that supports the PageMaker format. The
import process is virtually the same as the export and should go seamlessly.
All of the above is true for Western languages and to some degree for Eastern
European languages. Any of the more complex languages, however, including
the bi-directional languages (Hebrew and Arabic) or the Asian double-byte
languages, are flat-out not supported in the Western versions of PageMaker.
QuarkXPress
Despite the fact that Quark has never been very popular in the translation
community (because of a lack of Unicode support until fairly recently and
different and more expensive versions for different languages, etc.), it used to
be the dominant player in the desktop publishing market, so it is not too
surprising that there is decent support for earlier versions of Quark among the
translation environment tools.
• Star Transit offers a separate plug-in that supports the batch processing of
the English (and Passport) versions 3-9.2 for both the Windows and Mac
platforms.
• Trados (version 2007 and below) offers plug-ins for versions 4.1-6 for
English (and Passport) and version 4.1 for Japanese.
All of these plug-ins were preceded by a program called CopyFlow (see
www.napsys.com) which, just like these programs, allows for the batch export
and import of text from Quark files (up through version 9).
If you need to translate Quark 7 and above files, Star Tansit (see above) and
CopyFlow are presently the only tools that allow for an export into a TEnT-
processable format (both on Mac and Windows). In fact, many of the TEnT
vendors now directly recommend the use of CopyFlow.
The European language Passport edition of Quark, which has additional spell-
checking and hyphenation capabilities for Western and European languages, is
supported by the above-mentioned tools. If you have only the (cheaper)
English version, you need to make sure to ask your client to save the file as a
"Single Language" file. Otherwise, if the Passport edition was used you will not
be able to open the file.
QuarkXPress’s last Middle Eastern edition was for version 6.5. Fortunately,
however, there are XTensions—QuarkXPress-specific plug-ins—for the English
version of Quark that extend its ability to write in Hebrew, Arabic, Farsi, and
Jawi. ArabicXT, HebrewXT, FarsiXT, and JawiXT are all available at
www.arabicsoftware.net through versions 8 of Quark.
It becomes much more hairy with the Asian double-byte languages. While the
Japanese version 4.1 is supported by the Trados plug-in and several others by
CopyFlow, it at least means that you have to have several versions of Quark
for different languages, plug-ins, and platforms.
For a quick overview, here is a chart showing which TEnT supports which DTP
format:
The most common error is that of missing fonts, which could be either fonts
that are truly missing or, just as likely, fonts that have a slightly different
naming conversion on a Macintosh system than on a Windows platform or vice
versa. You can choose to remap the fonts on a permanent basis (not a good
idea if your client wants to open this on a Macintosh again) or on a temporary
basis.
The other consideration is the differing character set between Windows and
Macintosh, which, if not converted properly, will result in a corruption of
special characters. Assuming that you have performed your translation in a
text-based format on a Windows computer, you have several options to
change the character set.
• You can do this in a Windows version Word 2000 or higher (see page 156).
• You can open and save your text file in a Macintosh version of Word 98 or
higher, which will automatically convert the Windows character set to a
Mac character set.
Graphic Formats
For graphic applications, the same common threads seem to apply as for
desktop publishing programs: they are expensive, they’re not very intuitive to
learn, and they present considerable obstacles during translation.
Like its desktop publishing programs, Adobe also offers its graphic
application on a month-to-month rental basis, which might be a good
option for some projects (see page 310).
Another graphic application that has been helpful is the low-cost version
of Adobe Photoshop—Adobe Photoshop Elements (see www.adobe.com/
products/photoshopelel).
I have not yet encountered a client who has complained about my lack of a
full-featured, high-priced graphics program; in fact, they are usually very glad
to supply me with Excel spreadsheets, in which I can translate the text of the
graphics that can be pasted into the graphics by desktop publishers (probably
faster, better, and cheaper than I could do it, anyway).
Pixel-Based Formats
Most graphic formats (including JPG, GIF, BMP, TIFF, and various others) don't
contain text. This is true even if it appears to be readable text because the
text is nothing more than pixels (little colored dots) on a virtual canvas. While
they may form shapes that represent letters, these have nothing to do with
the editable letters or words you will deal with in a text editor.
Short of recreating these kinds of graphics from scratch, you will need to get
your hands on the "source files." (Yes, I know that clients hate to be asked for
that, but typically it helps to mention that otherwise they will have to pay ten
times as much.)
Most JPG-, GIF-, BMP-, or TIFF-like files were created in a layered file that
includes one (or several) layers with real, editable text. Since they were most
likely created in Adobe Photoshop, they will have a PSD extension and can be
opened in, well, Adobe Photoshop.
Figure 217: Image file opened in Photoshop with active text layer
The nice thing is that Adobe offers a low-priced version of its program (see
www.adobe.com/products/photoshopelel) that is more than adequate for
translating the text layers that need to be translated. Or you can also use
GIMP (see www.gimp.org), a powerful open-source image editor that allows
you to work with PSD files, though it may not be particularly user-friendly
(and it might also mess up some of the text layers—but at least you can
access the different layers, delete the text layer, and recreate a new one).
This all may not be good enough, though. Especially if you have a large
number of graphics and/or a translation memory database that contains much
of the translation embedded within the graphics, you will not want to perform
the translation "manually."
At the present time (January 2014) there is only one translation environment
tool that directly supports the translation of PSD files: memoQ.
If you don’t have access to memoQ you can use the tools provided by ECM
Engineering (see www.ecm-engineering.de) that allow for the extraction of
text from PSD files into RTF or XML formats. These formats can be processed
in TEnTs (translation environment tools, such as Trados, Déjà Vu, and Across)
and afterward re-inserted.
While the perpetual licenses for Sysfilters products are rather expensive,
you can also purchase 30-day-licenses for each of the filters (available
for Photoshop, Illustrator, CorelDraw, InDesign, and Visio). For more
information, see www.ecm-engineering.com/shop1/
product_info.php?products_id=36.
More often than not, we don’t have access to the source files and have to go
through frustrating re-creation or manipulation processes with the graphics.
To ease at least some of that (translator) pain, there are a number of
management applications for images, such as the open-source tool Image
Localization Manager (see sourceforge.net/projects/ilmanager/). It’s a nifty
little tool that allows you to quickly browse folder and subfolders for graphics
that are displayed nicely within the tool, determine which contain localizable
content, create a list of those image files, and (manually) enter the source
text. This can then be exported into a file that you can translate in something
like Excel or, if you want to view the graphic as you translate, in Image
Localization Manager. Of course, when that’s done, the translated text and the
graphics have to be passed on to the desktop publisher—but as translators,
that's not our worry!
Vector-Based Formats
The above graphic types are pixel-based graphics. Another kind of graphic
that is often used, especially in manuals, is vector-based graphics. You can
recognize them by their typical extensions, EPS or AI. They are very different
from pixel-based graphics because they are formed by mathematical formulas
rather than by simple dots. So, rather than displaying a wheel by arranging a
lot of pixels in a circle, a vector-based graphic would calculate it with some
kind of pi-based formula.
If you would like either to batch process the files or to use your translation
memory, there are two different options.
The above-mentioned ECM Engineering offers products for Illustrator and
Draw files that pre-process the files for use in TEnTs similarly to the way they
process PSD files.
The second option is to save the vector-based files into the XML-based SVG
format, which is directly supported by Heartsome, Swordfish, memoQ,
Trados, and some versions of Star Transit.
Taking Screenshots
This does not directly relate to translating graphics, but taking screenshots
(pictures of the computer screen or dialog boxes) is often part of our job
description as translators—for instance we might have to replace the graphics
in the source language in a software manual with those in the target language
(provided that the respective software is already translated and functional).
When taking screenshots, I have usually found it sufficient to take them the
"traditional way" (ALT+PRINTSCREEN for the active dialog or PRINTSCREEN for the
complete screen through Windows 7; Windows 8/8.1 only uses
WINKEY+PRINTSCREEN for the complete screen) and then paste that into a
regular graphic application (in Windows 8/8.1 the screenshots are
automatically saved as graphic files under Pictures).
However, there are also programs that specialize in taking screenshots, and
while they don’t fix everything, they are a lot more versatile than what
Windows offers.
When you help someone solve a computer problem, the easiest way to
describe the problem is often to take a screenshot of the error message or
whatever dialog you have problems with, paste it into an email, and send it
off. But what most people don’t realize is that the images that are pasted into
the email are gigantic files in bitmap format. While transmission of large files
may no longer be a problem for today’s high-speed world, storage of these
large files in email is (especially when the email is sent back and forth several
times with the image still there, slowly increasing the size of your inbox to
obscene proportions). One way around this is to paste the screenshot into a
graphic application you use, save it as a GIF or JPG file, and send that. But
that requires a lot of additional steps. . . .
Another free screenshot tool with not quite as many options is Greenshot
(see greenshot.sourceforge.net).
If you just need a screenshot tool for simple, low-resolution screenshots,
you might also want to have a look at Snipping Tool, which is integrated
into Windows Vista and above:
A tool that specifically allow you to "harvest" text from screenshots is the
ABBYY Screenshot Reader (see www.abbyy.com/screenshot_reader). This can
be for instance very helpful in copying chunks of texts from non-copyable PDF
or other file formats or from dialog boxes for localization purposes.
Flash Files
Flash files used to be very similar in their translatability to graphics files: non-
editable SWF or FLV files are compiled versions of an editable FLA file (similar
to how JPG/GIF/... files are the non-editable versions of the editable
PhotoShop PSD files). And, also like with graphics files, even the source files
were not easily editable in any application; instead, you had to have some
version of Adobe (earlier: Macromedia) Flash to do this.
Note that there are a number of tools that "decompile" SWF files into
graphical and textual elements. This process is highly unreliable and
therefore not recommended.
If you’re up to the challenge of working directly in Adobe Flash, you can
also rent it on a month-to-month rental basis (see page 310).
If there was translatable text within Flash files it was stuck in a layer that was
merged with other layers and was therefore not translatable. I’m saying was
because nowadays things are (or should be) a lot easier.
Back in 2005, Flash 8 first introduced the concept that text strings could be
stored separately and externally in an XML file. This file can be translated with
a normal text editor or (preferably) a translation environment tool and stored
with language-specific extensions so that the correct language version would
be retrieved according to the language preference of the user.
Unfortunately, this does not mean that everyone creates Flash files this way
and it might be helpful to point your client to this reference (for Adobe Flash
Professional CS5 and above) on how to create multilingual text properly:
http://tinyurl.com/26by322.
Tagged Formats
Tagged files are files that are text-based and that typically contain a mixture
of "normal" translatable text and "tags," elements that allow for the
structuring of the content, page layout, text formatting, insertion of images,
etc. Examples of tagged files are the exported text-based formats for the
translation of content in some desktop publishing programs (see the example
on page 318), but more typically tagged formats include HTML, XML, or SGML
files (see the definition on page 150).
Because tagged text files are "just" text files, they can be translated with a
text editor. However, this is typically not a good idea because
• the tags are quite sensitive to corruption, i.e., just deleting or adding a
part of a tag may utterly corrupt a file;
• though it would be possible to process tagged text files as plain text files in
translation environment tools (TEnTs), it would mess up your translation
memories with a lot of unwanted coding information; at the same time,
you will not really benefit from your translation memory content because
there will be very few matches for heavily coded sentences.
Instead, you should be using TEnTs that support tagged text formats, and
most of them do. The concept of supporting these formats is to hide and/or
protect any untranslatable information and only to display translatables.
This is relatively easy to do with HTML because it is a defined format that does
not allow any deviation, but it is more difficult with XML and SGML files. These
files are by definition user-definable and require you to "teach" the program
how to interpret any given file. Any of these file types refers to a "Document
Type Definition" or stylesheet that determines how each element of the file
should be treated.
While the DTD file for HTML is a global declaration that any of the supporting
tools refer to, XML gives a somewhat universal access through a supporting
technology that describes how to format or transform the data in an XML
document, the so-called Extensible Stylesheet Language (XSL). Many
translation environment tools offer a predefined XML filter based on a
common set of XSL variables that is often sufficient to process XML files.
As SGML files have no such common denominator, you will need to create a
specific "filter" or "settings" to process these files.
In Trados 2007 and earlier, tagged files are processed in the TagEditor (thus
the name). Upon opening any of these file types in TagEditor, the following
dialog may appear (if it is not displayed automatically, you can open it through
Tools> Tag Settings):
You can see the predefined HTML and XSL (for XML) settings.
You can change the properties of any of these settings files by selecting Edit
(only do this if you are unhappy with the results of the existing settings files)
or create a new settings file by selecting Add.
Other predefined settings in TagEditor include those for XLIFF, DITA, and
RESX files. One of the great advantages of XML is that it is ideally suited
as an exchange format—among many others, it is the source format for
the translation exchange formats TMX, TBX, and XLIFF (see page 272).
DITA (Darwin Information Typing Architecture) is a new XML-based standard for
authoring, producing, and delivering technical information, and RESX files are XML-
based resource files for .NET applications.
If the prepared options are not sufficient for your XML file(s), you will have to
create a new filter type based on an XML sample file by selecting New under
File Types.
As in the previous versions of Trados, a wizard will guide you through the
different steps of creating the file type.
Déjà Vu also contains predefined XML "filters" (available under C:\Program
Files (Windows 7/8/8.1: ProgramData)\ATRIL\Déjà Vu X(2)\Templates\),
but just as in Trados it allows you to either edit that existing filter or create
filters for other SGML files. You can access this feature by selecting File>
New> SGML/XML Filter, and the wizard will lead you through the creation
of a very customizable filter file. It is possible to forego the import of a DTD
file and you can choose to import an SGML or XML file directly to create a
filter.
As you import the XML or SGML file into Déjà Vu, you will need to make sure
to select the appropriate SGML/XML filter file during the import process under
Properties.
Most tools, including both Déjà Vu and Trados, allow the fine-tuning of the
filters so that you can exactly determine which parts inside or outside a tag
are translatable or to be protected. Typically, it is enough to go through the
process of creating a filter or settings file for an XML/SGML project only once
because usually all files will adhere to one standard.
XML Files with Embedded HTML
While most XML files are relatively easy to process, some XML files have
traditionally presented a real headache until very recently: those with
embedded HTML.
p;nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&am
p;nbsp;&nbsp;everyone who is interested in better
performance.</Text></Answer>
You can see that the XML tags are enclosed with the typical <less than and
greater than> tag markers and they will be easily recognized by your TEnT.
The actual translatable text
is in the midst of lots and lots of HTML code, for which the less than and
greater than tag markers are encoded (< and >) as well as the
ampersand sign in the non-breaking spaces (& inside of ).
Importing a file with this segment into most XML-enabled TEnTs results in
this:
Figure 229: XML file with embedded HTML in early versions of memoQ
The XML codes are protected (in this case hidden), but the encoded HTML
codes have been turned into proper HTML codes that are not protected and
can thus be easily corrupted. Aside from the danger of corruption these are an
incredible nuisance because a) you will have to understand them, b) you will
have to translate around them, c) they will make spell-checking a nightmare,
and d) they will pollute your translation memory to no end.
To avoid this scenario, long and tedious workarounds were needed that
involved the conversion of the XML files into Word files and the semi-manual
pre-processing of the XML and HTML tags.
There were even a couple of tools on the market that were specifically
designed to aid with that process. One was a standalone tool called
PrepTags (see www.your-translations.com/preptags.php) and the other
is a free little Word macro called Tortoise Tagger (www.accurussian.net/
tagger.htm).
Three of the leading translation environment tools have finally put an end to
the misery by offering better solutions.
The most straightforward routine comes with Déjà Vu X2. Here you simply
check Process Embedded HTML when configuring the import of the file:
MemoQ has chosen a slightly different path that has applications for other
scenarios as well. Here you can select to use cascading filters for the import of
the file so that several routines are applied in the filtering process:
While Trados Studio does not offer a "pre-canned" approach like its two
competitors, it offers helpful step-by-step instructions on how to configure a
filter for an XML file with embedded HTML: http://bit.ly/RUU9HR.
• binary files, or files that cannot be opened and edited with a text editor,
and
• flat files, or text-based files that can be opened and edited with a text
editor.
The binary files traditionally include formats such as EXE, DLL, or OCX files. To
translate these files, you will either
• need a specific software localization tool that allows the direct translation
and necessary strings as well as further language-specific development
work and testing (for further information on processing binary file formats,
see Software Localization Tools on page 299) or
In much the same way that tagged files work, it would be possible to translate
RC files in a text editor, but it is not advisable to do that because a) you will
most likely overlook text that needs to be translated, b) you may overwrite
code where that should not happen, and c) there is just no reason not to use
your translation memory for this. In fact, software files are rarely translated
on their own. Typically they are translated as a precursor to accompanying
documentation—documentation that will be using references to the translated
software over and over again—an ideal scenario for the use of translation
environment technology!
Many TEnTs support the translation of RC files, including Déjà Vu, SDLX, Star
Transit, Across, and Trados.
Many newer programming languages do not use a compiled format for their
resource files. Often this takes the form of XML-based formats such as the
.NET RESX format. Many TEnTs (and of course localization tools) support this
format, but among TEnTs it’s memoQ that offers the most advanced support
with a feature that not only allows for the translation but also the resizing of
dialog boxes.
Extensions are always a first indication of what the file type could be if
you are not sure what format a certain software file is in, but they will
often fail you with software files. If you are not sure about the file type,
open it in a text editor and study the structure of the file. If translatables
are enclosed with quotation marks, try to process the file as an RC file or
with one of the other software filters. If the translatables are preceded by an equal
sign, try to process them with the Properties filter. As all of these files are text-
based, this will not damage the files and very often you will find that you "get
lucky," even though the file at hand may not be one or the other.
Another text-based software standard is GNU gettext PO and POT files. These
are the translatable language resource files used in the free GNU gettext
concept for translating software and documentation. GNU gettext is the
de facto standard in many open source projects, and it works with a large
variety of programming languages. PO files are typically translated or
pretranslated files, whereas POT files are the translatable templates.
Aside from the internal tools that gettext offers (see www.gnu.org/software/
gettext), Déjà Vu, Heartsome, Swordfish, OmegaT and Open Language Tools
are the only translation environment tools that directly handle these files.
Trados Studio can handle them with the help of a free app in the SDL
OpenExchange app store (see www.translationzone.com/openexchange/app/
filetypedefinitionforpo-471.html?c=45277).l
Help Systems
Help systems—the documentation resource that is typically part of a software
program and can be accessed through the help menu—is a huge topic on its
own. I’m not planning to cover this in its entirety, but there are a few
questions that I have been confronted with over and over again, and here are
some quick answers for those.
First of all, there is a great variety of help systems, but the two most often-
used help systems in the Windows world are HTMLHelp and the increasingly
outdated WinHelp.
WinHelp
The compiled WinHelp system typically consists of two files, the CNT file and
the HLP file. While the CNT file is a text-based file that contains the table of
contents for the help system, the HLP file is a compiled file that is made up of
any number of RTF files.
These RTF files have to follow strict guidelines as to how they are created
so that hyperlinks, index markers, and section breaks function correctly.
Most larger translation environment tools (especially those that have
been around for a while and seen the heyday of WinHelp) have facilities
to accommodate these special features (such as hidden text for
hyperlinks or the various kinds of footnotes).
Figure 234: View of an RTF file before its compilation into a WinHelp help system
In case you receive a CNT and HLP file for quoting or even translation
purposes, there’s an easy way to "decompile" the HLP file into its RTF
components. While there are a number of expensive commercial tools for
compiling and decompiling WinHelps, under sourceforge.net/projects/
helpdeco you can find the HelpDeco application which allows you to break
apart your help file and analyze and translate the resulting RTF files (and
typically any number of image files).
One file that is also created in the process is an HPJ file, the help project file.
Though this file is not to be translated, it is important because it contains the
information on how to re-compile the project once the translation is done. The
free Microsoft program that can be used to do just that is called Microsoft Help
HTMLHelp
The process for HTMLHelp is similar but much simpler. Unlike the WinHelp
system, HTMLHelp consists of only one file, the CHM files. True to its name,
most of the translatable content of an HTMLHelp system is contained in HTML
files. To "get to" the HTML files, you will also need to decompile the help file.
Fortunately, both the compilation and decompilation are done with the same
freely available and easy-to-use tool: HTML Help Workshop.
To decompile an existing help file, just select File> Decompile, locate the
CHM file, and choose a location to which you would like to export the files. You
could receive a great number of different file formats, but the most typical
are:
• HHP: the non-translatable project file (you will need this file to recompile
the help),
• graphic files: these are often translatable and/or have to be replaced with
newly created target counter-parts, and
• lots and lots of HTML files with lots and lots of translatable content.
Before you start with the translation of your HTMLHelp project, here is one
thing you should be doing first: Talk to your client about the format in which
the authoring of this project took place. Chances are that it was either
authored in FrameMaker (like this present manual and help system), in some
kind of XML form, or even within Word. While it is entirely possible and really
quite easy to translate the HTMLHelp directly, your client may be much better
served if you are able to work in the original format. Typically the original
authoring environment is set up so that the output can be done in various
formats (PDF, printed materials, web based, help systems, etc.), whereas it is
much more complicated to do this when you start with a help system.
If your client asks you to translate the help system directly, translate the
above-mentioned files, replace the graphics (save them under the same name
and the same location), and then recompile the individual files with HTML Help
Workshop.
Once you’ve fixed any possible errors, you can proceed with the compilation in
HTML Help Workshop. Just select the HHP file (make sure that it’s placed at
the root of your project folder), select File> Compile, and your help file will
be all ready to go.
You can also use HTML Help Workshop to convert existing WinHelp
projects. When you convert a WinHelp project to an HTML Help project,
the New Project Wizard converts the WinHelp project (HPJ) file to an
HTML Help project (HHP) file, the WinHelp topic (RTF) files to HTML Help
topic (HTM, HTML) files, the WinHelp contents (CNT) files to HTML Help
contents (HHC) files, and the WinHelp index to HTML Help index (HHK) files.
Database-Based Data
It’s a strange thing with data in databases. So much of today’s translatable
content is stored in databases for easy and quick user access (this is
especially true for web-based content), but translators are often met with a
bit of suspicion when it comes to the translation of that data. And there is
probably something to that suspicion. Much like software development files
that developers often have a quasi-emotional attachment to, database
administrators rightly feel quite protective of their database content,
translatable or not. As a result, data from databases is often exported to an
exchange format such as CSV (comma-separated value file—a text-based file
where data is separated by commas; see page 177) or Excel spreadsheets.
While these formats typically do not provide many problems to be processed
by translation environment tools, there are two significant drawbacks:
• depending on the database, the data may not only not have context but
may also be concatenated (one string consisting of many pieces that are
not necessarily displayed together), thus making it very difficult for the
translator to translate appropriately, and
It is therefore not surprising that a number of tools have tried to come up with
solutions to translate database content directly within the database
environment. Though there are a great number of different database formats,
there are also standards that allow communication with almost all database
formats. ODBC, Open Database Connectivity, is a native interface that allows
access to most database management systems and allows for the use of SQL,
Structured Query Language, the universally used language to "talk" to
databases. By using this interface and this language, a number of computer-
assisted translation tools are now able to translate database content directly,
even from as complex an environment as Oracle or MS SQL databases.
The very large translation providers have developed their own interfaces to
these systems (or they’ve outright bought CMS providers, such as SDL's
acquisition of Tridion), but since so much translatable content is moving into
CMSs, the smaller providers need ways to take part in this as well. One way to
do this is through middleware tools such as Clay Tablet (see www.clay-
It’s important to note that this tool is not installed on the language provider’s
end but on the client’s end, integrating with the CMS and then connecting with
an outside system. But Clay Tablet is also available with an affordable SaaS—
Software as a Service—model, essentially forestalling any huge expenditure
for anyone.
Frustrating or not, in translation work, we encounter PDF files daily. They can
be source text files, documents for proofreading, reference files, and various
registration and other forms. We often also need to create PDF files, for
example, for résumés, invoices, file sharing, and printing/publishing.
• text-based files
• image-based files
• searchable image-based files
In text-based PDF files, the text is "real" text; you can copy and paste text
from the file (unless it’s restricted by the file’s security settings) and search
for text in the file. Converting these types of files to a fully editable (and
translatable, translation-environment-tool-compatible) format, such as to a
Word file, is less problematic than with image-based files, though it’s not
necessarily simple as we’ll see later.
The third type, the searchable image-based file, is kind of a hybrid between
the two other types. It’s an image file that is searchable, i.e., you can search
text even though it’s an image. A searchable image-based file can be created
from an image-based file using the Recognize Text Using OCR (or: Text
Recognition) function in Adobe Acrobat (not available in the Reader version).
As with any OCR program, the results depend on the clarity of the text in the
image. If you have a hard time reading the text, don’t think that the program
can read it any better. You can also copy and paste text from a searchable
image file, but again the resulting text depends on how accurately the OCR
program recognizes the text.
Why do we need to talk about PDF files and related tools? The better we
understand the possibilities and limitations of these files and the related tools,
the easier it is to find the best and most efficient ways to handle them. For
example, knowing proper tools can save hours of tedious manual editing when
converting PDF files to an editable format.
PDF Tools
Adobe Reader is probably already in almost everyone’s computer. It allows
you to view and search PDF files and also comment on files that have been
enabled for commenting (more under Enabling Extended Features for Adobe
Reader on page 373).
In addition to the free Reader version, the Adobe Acrobat product family also
includes Adobe Acrobat Standard and Adobe Acrobat Pro versions.
Note that here the name "Adobe Acrobat" refers to these three
paid versions and "Adobe Reader" to the free Reader version.
You should be sure to review the additional features that these paid versions
offer, such as enhanced editing, commenting, PDF file creation, file
conversion, security settings, etc. (see the following table). For many
translators, the additional features that Adobe Acrobat Standard and Pro offer
are certainly worth the expense. For a full product comparison, see
www.adobe.com/products/acrobat/matrix.html.
(*) Possible if the file has been enabled for this in Pro version.
In addition to the Adobe Acrobat products, there are many more or less
comparable and often less expensive programs that allow you to do many of
the same things. For example, PDF Nitro (www.nitropdf.com), Foxit PDF Tools
(www.foxitsoftware.com), Solid PDF Tools (www.soliddocuments.com),
DocuCom PDF Gold (www.pdfwizard.com), Pdf995Suite (www.pdf995.com),
and many others. However, if you consider any of these other tools, make
sure that the tool is fully compatible with all current Acrobat features. You
don’t want to spend all that money to find out later that you need Adobe
Acrobat after all.
I will concentrate on Adobe Acrobat here and will not cover any of these other
tools.
The Comment & Markup tools can be accessed through Tools> Comment
& Markup (before Acrobat X) or Comment (Acrobat X and higher). Most
users are familiar with the yellow Sticky Note tool and tend to use that for
everything. However, in most cases it would be much more efficient and
clearer to use some of the other tools. The Text Edits tool in particular is very
good for indicating text corrections, additions, and deletions. Note that none
of these tools actually changes the text in the file—they only indicate what
needs to be changed. The actual changes will then be made to the original file,
for example, by a DTP person. Other Comment tools that are often useful in a
review process include Highlight Text, Callout, Arrow, Rectangle, etc.
They help to pinpoint the location where the associated comment is supposed
to apply.
Figure 240: Commenting tools in the Comment pane Adobe Acrobat X and higher
Managing Comments
Sometimes it can be difficult to manage all the comments in a file, particularly
if the file is long or includes a lot of comments. Adobe Acrobat offers several
tools to help organize and manage comments. Clicking the Show button in
the Comment & Markup toolbar opens a menu that includes several options
for showing or hiding all comments (Show/Hide Comments) or only certain
comments based on comment type, reviewer, or comment status. In Adobe
Acrobat X and higher you can access this under Comment> Comment List.
The Typewriter tool (in Acrobat X and higher under Comment> Add Text
Comment) is handy when you need to type something extra to a document,
such as when filling out forms or adding the name of the fax recipient on the
page. It basically allows you to type anything anywhere on top of the
document.
You can also convert image-based PDF files to searchable image-based PDF
files, as mentioned earlier. This allows searching because normal image-based
PDF files are not searchable. This can only be done using the Adobe Acrobat
Standard or Pro version. With the image file open in Acrobat, select
Document> OCR Text Recognition> Recognize Text Using OCR (in
Acrobat X and higher: Tools> Text Recognition). As with any OCR program,
the results depend on the clarity of the text in the image. If you have a hard
time reading the text, don’t think that the program can read it any better.
Electronic Signatures
In general, I try to avoid printing paper copies as much as possible. Using
Adobe Acrobat, PDF files, and electronic forms or e-faxes often makes this
possible and easy. Unfortunately, for some users this works well until it
becomes time to sign the document. At that point they feel forced to print out
the document and look for a pen. However, it is also often possible to insert an
electronic signature into the document. This can be done in two different
ways. The simple and less secure method is to insert a scanned signature (for
example, a JPG file) into the document by copying and pasting it via the
clipboard. It gets inserted as a "stamp," and you can easily resize and relocate
it by dragging it to fit the space available. Note, however, that anyone can
copy your signature as an image from the PDF file and reuse it (unless you set
the security settings to restrict copying). A much more secure method is to
use a digital ID.
Creating a digital ID is a simple process and needs to be done only once. After
that, using it is very simple. To create a digital ID, select Advanced>
Security Settings> Digital IDs> Add ID> A new digital ID I want to
create now> Next> New PKCS#12 digital ID file> Next (in Acrobat X
and higher: Sign> Sign with Certificate). Fill the information fields as
needed and click Next. Select a location for the ID file and define a password.
Click Finish to return to the Security Settings dialog and click Close. That’s
it.
You can create several types of digital IDs and even include an image in it.
The image could be, for example, a logo or your handwritten signature.
When you want to sign a PDF document, select Advanced> Sign & Certify>
Place Signature (in Acrobat X and higher: Sign> Sign with Certificate).
Drag a rectangle where you want to place the signature. Choose a digital ID
from your list of available IDs, type the password, choose an appearance, and
click Sign.
To create a digital ID and to sign a document using a digital ID, you need
Adobe Acrobat Standard or Pro. Signing of documents is possible with the
Reader version only if the feature has been enabled in the document by the
author using Adobe Acrobat Pro (for details about enabling, see Enabling
Extended Features for Adobe Reader on page 373).
A good conversion program converts a PDF file to a Word file with flowing text
but conserves formatting (bold, italics, paragraphs, tables, etc.) without
creating text boxes. If the PDF file is an image-based file (such as a scanned
or faxed document), the program also needs to be able to convert the image
to text accurately. I will briefly review and compare a few options that are
available for this task. These include Adobe Reader, Adobe Acrobat Standard/
Pro, ABBYY PDF Transformer, and ABBYY FineReader.
Adobe Reader
Adobe Reader offers only two possible conversion methods: text can be
copied and pasted using the clipboard, or the file can be saved as a text file
(File> Save as Text or File> Save as Other> Text). With both methods,
each line ends with a hard return (paragraph mark), so they are practical only
for a small amount of text.
Figure 249: Text copied using Adobe Reader, showing paragraph marks at the end of each line
Some tips for selecting text in Adobe Reader and Adobe Acrobat: You can
select a whole page by clicking it four times. When selecting all (CTRL+A), "all"
can either be a whole page or a whole document depending on the Page
Display setting. If the setting is Single Page (View> Page Display>
Single Page) only the current page will be selected. If any other page display
setting is selected, the whole document will be selected. When copying text,
sometimes, depending on the file, there might also be an option to copy with
or without formatting (right-click menu). You can also use the Column select
mode to select a rectangle area of text anywhere in a document. It’s activated
by keeping the ALT button down while dragging a rectangle over the target
area.
Adobe Acrobat
The Standard and Pro versions of Adobe Acrobat offer some additional
conversion methods. You can select File> Export to File> Save As (Acrobat
X and higher: Tools> Content Editing> Export File to), which allows
saving the file directly in various file formats (such as Word, Excel, HTML,
XML, etc.).
Once you select text within a file, there are also a number of right-click menu
options available: Copy, Copy As Table, Save As Table, Open Table in
Spreadsheet. These table options can be quite handy when trying to convert
text into a table format (great for creating glossaries). However, tables can be
very tricky to convert with any of the above methods. For example, I have
been able to convert a table very well using the Save As Table or Open
Table in Spreadsheet options, but they usually convert only one page at a
time even if I select several pages of the table. The conversion settings can be
accessed though Edit> Preferences> Convert From PDF.
Figure 250: Available settings for converting PDF files to DOC format in Adobe Acrobat
Here I cover only the two above-mentioned ABBYY products, but the basic
functions and principles are pretty much the same with the Nuance products
as well. Both companies offer free trial versions, so it makes sense to try them
out first, using files that are typical in your work, before making a final buying
decision.
ABBYY PDF Transformer is a very simple program to use, and you can convert
a PDF file with just a few clicks. In addition, it also creates PDF files. The
program reads the PDF file and converts it to the desired target file format
(Word, Excel, HTML, text, or searchable PDF). It offers a few settings and
tools to customize and improve the conversion process. First, you need to
select the correct language(s) and the desired advanced (layout) options.
When converting to Word format, you have the following three advanced
options available:
• Original layout: Creates an output document that looks exactly like the
original. Text is often placed in text boxes and is difficult to manipulate.
• Text flow: Retains the text of the original, but some of its formatting will
be lost. The output document will retain paragraphs and fonts but will not
retain columns, exact locations of objects, or spacing, and is easier to
manipulate.
• Keep pictures: Retains the pictures of the original document.
Advanced Options for Excel files are Ignore text outside tables and
Convert numeric values to numbers.
Figure 251: Converting a PDF file to Word format in ABBYY PDF Transformer. Green areas
indicate text areas and red ones image areas.
ABBYY FineReader
FineReader is a full-scale OCR program that does much more than just
convert and create PDF files. It can be used to convert scanned documents
and several types of image files to editable format. It has several more
options and features compared to PDF Transformer, but it’s still quite easy to
use.
Figure 252: Converting the same PDF file to Word format in ABBYY FineReader
• Saving the recognized text in the chosen output file format (such as Word,
Excel, PowerPoint, HTML, text, etc.)
FineReader offers several general and file-specific options for perfecting the
conversion process and the output (Tools> Options). Many of the tasks can
be customized and automated so that a complete conversion can be done with
just a couple of clicks (Tools> Automation Manager). When converting PDF
files to editable file formats, it’s important to select the Document Format
Saving Mode that produces the most suitable output. The availability of the
four modes depends on the output file format. When converting to Word
(DOC, DOCX, RTF) format, all four modes are available:
• Exact copy: Formatting corresponds to that of the original but the ability
to change the text and format of the output document is very limited. Text
is often placed in text boxes.
• Editable copy: Formatting may differ slightly from that of the original but
document is easy to edit.
• Formatted text: Fonts, font sizes, and paragraphs are retained but not
the exact spacing or locations of the objects on the page.
Figure 253: Available layout options when converting PDF files to Word format in ABBYY
FineReader
The FineReader main window consists of four panes, various toolbars, and the
menu. The Page pane displays the pages of the current document. The
Image pane displays an image of the current page and allows editing of
image areas, changing area types (text, table, or picture) and text properties.
In the Text pane you can view the recognized text, and check spelling, and
format and edit the recognized text. In the Zoom pane you can see an
enlarged image of the line or image area currently being edited. While the
Image pane displays the general page view, the Zoom pane provides an easy
way to view the image in greater detail, adjust the area type and position, or
compare uncertain characters with the enlarged image.
For this, the CodeZapper Word macro that eliminates most unnecessary
codes can be very helpful. This can also be used to help to clean up
converted Word files. You can obtain it from http://asap-traduction.com/
CodeZapper.
This is good news, but unfortunately in many cases the results of those
conversions are less than desirable.
The purpose of a PDF file is usually to be the end product, and they are not
really made to be edited. Unfortunately, we are sometimes stuck with a PDF
file as the only source file available, and in order to translate it with a
translation environment tool, one needs to convert it to an editable file format
(such as a Word or text file) first. There are various tools for that purpose and
they work better or worse depending on the tool and the PDF file in question,
as explained earlier. Note that the PDF file translation feature in the above-
mentioned translation environment tools is not some new miracle that all of a
sudden makes PDF files translatable—it’s just one of those PDF-to-DOC
conversion tools that has been built into them as a filter. For example, Trados
uses a converter by Solid Documents; Publisher, Fluency, and Wordfast Pro
use BCL; and memoQ employs Glyph & Cog to convert PDF files into a Word or
text file and offer a translation interface for that.
memoQ offers two different ways of processing PDF files. With the first and
possibly more efficient option it does not even pretend to save the formatting
of the file. Instead, it "only" converts it into a text file without any formatting
(but also without any superfluous hard returns, etc.). While this does not
sound attractive in the first place, in many cases you might end up saving
time (and ugly surprises) even though you will have to spend considerable
time formatting the file once you are done with the translation.
Fluency uses a PDF converter that tries to retain the formatting of the PDF, but
it offers an intermediate step where you can edit the translation file with its
possible (and likely) issues before you start the translation process.
Déjà Vu X2 converts the PDF into a Word file and uses an integrated version of
the CodeZapper tool (see page 386) to eliminate most unnecessary codes.
Trados, Wordfast Pro, and Publisher all try to do everything "behind the
scenes." You open a PDF file and the tool tries to keep the layout, and
presenting it in its translation interface for you to work in. Once the
translation is done, the file is exported into a Word or RTF file (which in all
likelihood is not the format the PDF originated from). For really simple PDFs,
this can work really well. And for others?
Figure 256: A PDF file that has been opened directly in Trados Studio. An overabundance of
tags makes translating the file very cumbersome.
You won’t encounter problems like this in the conversion of PDF files with tools
that rely on an OCR process, such as ABBYY Fine Reader or PDF Transformer.
Another problem that is often encountered with Trados—erroneous hard
returns at the end of lines—is also handled relatively well with the ABBYY and
Nuance tools. You may occasionally find a rogue hard return in converted files
from these solutions but they are few and far between.
Trados Studio’s PDF filter does allow some adjusting of the conversion process
(Tools> Options> File Types> PDF> Settings), but the options are very
limited compared to those of a proper PDF conversion tool.
Trados Studio contains a setting under the Common option (see the
image above) entitled Skip advanced font formatting, which helps
with exaggerated tags within words in some cases.
In summary . . .
• Don’t think that you can translate PDF files just like Word files even if you
can open them in your translation environment tool.
• If you open a PDF file in your translation environment tool, review the
converted text in the editor to see if there are problems with tags or hard
returns. You can try to fix the problems by adjusting the PDF filter settings
of your tool or by first saving the converted source file in Word format,
fixing the problems in Word, and then finally opening the fixed Word file
for translation. Remember that in most TEnTs you can’t edit the source
segments, so the errors need to be fixed before you start translating the
file.
Nowadays, most people are able to create PDF files with the tools they already
own without having to purchase any additional PDF creation tools. This feature
is included, for example, in MS Office 2007 and higher (Save As> Adobe
PDF) and OpenOffice/LibreOffice (File> Export as PDF).
If it’s not included in your version of Office 2007, you can download it for
free from Microsoft Office support site as the "SaveAsPDFandXPS.exe"
add-in.
Of course, it’s also possible to convert to PDF files with Adobe Acrobat
Standard and Pro as well as with most other PDF conversion tools. Depending
on the tool, the PDF file is created either by saving or printing the original file
as a PDF file.
These kinds of PDF files are called hybrid PDFs, and you can create them by
selecting File> Export as PDF> Create hybrid file (OpenOffice) or Embed
OpenDocument file (LibreOffice). Once you open hybrid PDFs within
LibreOffice/OpenOffice, you can save them as an ODT, ODS, or ODP file and
process without any formatting loss in a translation environment tool.
There are a very limited number of alignment tools that are able to handle
PDF files. Based on my experience, the best is Logiterm AlignFactory (see
page 243).
Figure 258: Selecting one of two possible languages in a German version of Dragon
Which texts are well suited—or better, which texts are not well suited—for
speech recognition? The answer to this depends partly on your particular
translation subject. In mine it is mostly texts with a lot of proper names and/
or loan words. This does not mean that you can’t teach the program to
recognize the proper names and loan words, but it’s one of those judgment
things: If you want to use speech recognition (or anything else for that
matter) to become more effective, you’d better make sure that you truly are.
If you have to spend an hour to train it to recognize a bunch of new terms
before translating for an hour and a half on a job that would otherwise have
taken you only two hours, that seems like wasted time to me. Plus, while I
enjoy translating, I can think of better things to do than training speech
recognition. On the other hand, if I can expect that these proper names and
loan words will also occur in future projects, I may just as well spend the time
to train.
My first rule for success with speech recognition software will probably have
the "purists" shaking their heads in agony. After having used the software for
some time, I know some of the weak spots of my speech engine (or my
pronunciation). Rather than using the "correct" function again and again, I
prefer to type those problem terms even while dictating the rest.
My next rule: Take some time to get used to not "thinking with your fingers."
Instead, try to preformulate longer segments and then speak them coherently
for better results.
This goes right along with the next kind of texts that are not well suited for
speech recognition because it's hard to say them naturally: texts with a lot of
formatting. Depending on what kind of translation environment tool you’re
working with and how formatting is handled by the tool, it may be easier to
use the keyboard shortcuts for those that you are used to. If there is really a
LOT of formatting, it may be easier to just type the whole thing.
When you use a translation environment tool that makes you work in an
interface other than Word, you will have a hard time doing everything with
voice commands unless you have the Professional edition, in which you can
easily write macros with virtually unlimited possibilities.
The problem is that while the Preferred version has a relatively modest price
tag, the Professional version does not. Once you have the Professional
version, you can either stay there and pay premium prices for upgrades
because you are interested in the slightly better recognition that typically
comes with each new version, or you can go the cheap route, downgrading at
some point but then losing all your macros. That's the problem that I am
stuck in with Dragon, so I’m not running the latest version.
Windows Vista and later also contain an internal voice recognition program.
This feature has suffered some very public criticism, but I was rather
impressed with its accuracy and user-friendliness in a couple of unscientific
tests that I ran. I dictated the same paragraphs in both programs and had
only a slightly worse recognition in Vista than in Dragon (96% vs. 98%).
Unless you are an awesome typist and refuse to change that geeky habit of
exclusively using your fingers to enter text, speech recognition is a great
alternative way to "type," even before carpal tunnel syndrome hits.
Support
For most (if not all) of the tools discussed in this book, you can make use of
support options of varying quality.
Ironically, the best support available is often for some of the tools discussed
under Utilities on page 117. Many of these tools are created and supported by
a small handful of developers who can be passionate about providing excellent
support. To access this kind of support, go to the appropriate website and
simply send them an email. I have often received detailed answers within a
short amount of time, or even an offer to rewrite the program to fit my
specific needs.
Obviously this won’t work with the larger programs and/or software
companies—you will be hard pressed to have Steve Ballmer respond to your
question about Windows or Word. And speaking of Microsoft, though there is a
plethora of websites and newsgroups that provide paid or unpaid services, I
find my easiest access to information through Microsoft’s Knowledge Base
under support.microsoft.com. The only drawback is that the descriptions in
the vast Knowledge Base sometimes tend to be slightly technical.
get support for any of these and other translation environment tools is
through user groups, almost all of which are located at groups.yahoo.com.
Most of these groups are extremely supportive and are very forgiving of even
the most basic questions. There are also a number of other translation
technology-related Yahoo! Groups that are devoted to more general questions
rather than any one specific product. I have usually found the specific groups
more to-the-point and helpful, though.
Conclusion
In my work as a translator, I derive at least as much joy from finding creative
solutions to translation tasks with my computer as I do from actual
translation. I’m not sure that you have to become quite so extreme, but I do
hope that some of the tips in this book may have given you new ideas or a
new desire to make your working experience with your computer more
efficient and less frustrating.
Arabic Bing 70
QuarkXPress 325 tips for 70
ArabicXT 325 BlueGriffon
Araxis Merge HTML editor 111
tool to compare files 105 bmp
Araya 220 graphic extension 329
Archivarius 3000 123 book
ASAP Utilities FrameMaker extension 312
add-on for Excel 177 mission 1
ASCII code purpose 1
enter characters 57 botnet 82
Asia Online breadcrumb trail 12
in TEnTs 264 browsers 69
assemble feature 250 Browsing Tips 70
Déjà Vu 211 browsing tips 70
attacks 80 brute force
AuthorAssistant 285 password cracking 134
AutoCAD built-in encryption 88
Star Transit 213 bulk e-mail 83
AutoComplete 250
AutoSuggest
Trados 209 C
AutoWrite cable lock 90
Déjà Vu X2 211 Caps Lock key
AVG Antivirus 92 beep when pressed 67
CAT tools
alignment tools 191
B categories 190
Böblingen conversion and maintenance tools 191
place of origin for TEnTs 193 project management tools 190
backdoor 79 quality assurance tools 192
backing up files resource lookup tools 190
Windows 7 46
software localization tools 191
backing up the system 51
backup term extraction tools 191
using external drive 46 terminology management tools 190
BeGlobal text extraction tools 191
in TEnTs 264 word count tools 191
behavior-based detection Catalyst
in antivirus software 92 localization tool 301
Belarc Advisor purchased by translations.com 302
computer inventory 56 support of databases 360
Belazar 265 translate CHM files 358
bi-directional languages categories
PageMaker support 323 of CAT tools 190
binary files categorization
software development formats 347 of CAT tools 190
MetaTexis Multilizer
overview 202 localization tool 303
translation environment tool 197 support of databases 360
Metro interface MultiQA 282
of Windows 8 9 MultiTerm Extract
MicroMATER 293 principles 288
Microsoft MultiTrans
machine translation 397 translation environment tool 197
Microsoft Firewall 91 My Documents
Microsoft Help Workshop changing path on Windows 2000/XP 27
download 354 MyMemory 265
Microsoft Keyboard Layout Creator 65 in TEnTs 264
Microsoft Knowledge Base 397 MZ
Microsoft Language Portal EXE header 109
TBX download 103
Microsoft Office
in connection with other programs 149 N
office suite 149 Navigation bar
release versions 150 in Word 161
working with clients 149 Netgear
Microsoft products router 84
support 397 routers 85
Microsoft Publisher newsletter
support by Fluency 224 Tool Box 399
Microsoft Translator Nigerian scams 81
in TEnTs 264 NoBabel 193
mif Norton Antivirus 92
FrameMaker extension 312 Notepad 101
Num Lock key
mission beeps when pressed 68
for book 1
Nvu
MITM 82 HTML editor 111
MorphoLogic
in itranslate4.eu 266 retain original formatting 112
morphological support
Star Transit 213 O
MS Publisher 317 OCR
msconfig integrated into Fluency 222
command 28
OCR function
MSDN subscription 103 in Adobe Acrobat 364
MSKLC
OCR tools for PDF files 380
keyboard creator 66
ocx
MTX 293 software development extension 348
MT2007
ODBC
translation environment tool 196 Open Database Connectivity 360
multi-language versions
Office
of Windows 7
compatibility warning feature 166