TXT1
TXT1
Most word processors are able to read and write some versions of RTF.[9] There are
several different revisions of RTF specification; portability of files will depend
on what version of RTF is being used.[7][10]
RTF should not be confused with enriched text[11] or its predecessor Rich Text,[12]
[13] or with IBM's RFT-DCA (Revisable Format Text-Document Content Architecture),
as these are different specifications.
History
Richard Brodie, Charles Simonyi, and David Luebbert, members of the Microsoft Word
development team, developed the original RTF in the middle to late 1980s. The first
RTF reader and writer shipped in 1987 as part of Microsoft Word 3.0 for Macintosh,
which implemented the RTF version 1.0 specification. All subsequent releases of
Microsoft Word for Macintosh, as well as all Windows versions, can read and write
in RTF format.
Microsoft maintains RTF. The final version was 1.9.1 in 2008, which implemented
features of Office 2007. Microsoft has discontinued enhancements to the RTF
specification, so features new to Word 2010 or a later version will not save
properly to RTF.[14] Microsoft anticipates no further updates to RTF, but has
stated willingness to consider editorial and other non-substantive modifications of
the RTF Specification during an associated ISO/IEC 29500 balloting period.[15]
RTF files were used to produce Windows Help files, though these have since been
superseded by Microsoft Compiled HTML Help files.
RTF specifications for Microsoft Word[16][17] RTF version Publication date
Microsoft Word version MS Word release date Notes
1.0 1987 Microsoft Word 3 1987 The latest revision came in June 1992.[18][19]
The 1992 revision defined support for Microsoft Object Linking and Embedding (OLE)
objects and Macintosh Edition Manager subscriber objects. It also supported
inclusion of the Windows Metafile, PICT, Windows device-dependent bitmap, Windows
device-independent bitmap and OS/2 Metafile image types in RTF.
1.1 Microsoft Word 4 1989 Allowed for font embedding, which lets font
data to be located inside the file.
1.2 1993 Microsoft Word 5 1991 [20][21]
1.3 January 1994 Microsoft Word 6 1993 1/94 GC0165; for device-
independence and interoperability, encouraged embedding bitmaps within Windows
Metafiles,[22][23] instead of using Windows device-independent bitmaps or Windows
device-dependent bitmaps.
1.4 September 1995 Microsoft Word 95/Word 7 1995 [24]
1.5 April 1997 Microsoft Word 97/Word 8 1997 Introduced Unicode RTF, which
supports 16-bit Unicode character encoding scheme; defined inclusion of PNG, JPEG
and EMF picture types in hexadecimal (the default) or binary format in a RTF file.
[25] Also contained a Japanese local RTF specification called RTF-J for the
Japanese version of Word; RTF-J is somewhat different from the standard RTF
specification.[25]
1.6 May 1999 Microsoft Word 2000/Word 9 1999 Included Pocket Word and
Exchange (used in RTF-HTML conversions).[3]
1.7 August 2001 Microsoft Word 2002/Word 10 2001 8/2001– Word 2002 RTF
Specification[26][27]
1.8 April 2004 Microsoft Word 2003/Word 11 2003 10/2003– Word 2003 RTF
Specification[4]
1.9.1 19. March 2008
(RTF 1.9 – published in January 2007)[28] Microsoft Word 2007/Word 12 2006
Allowed XML markup – Custom XML Tags, SmartTags, Math elements in an RTF
document, password protection, elements corresponding to Office Open XML Ecma-376
Part 4[29]
Code syntax
\b0
indicates that the Bold text is off
\b1
indicates that the Bold text is on
\i0
indicates that the Italic text is off
\i1
indicates that the Italic text is on
\ul0
indicates that the Underline text is off
\ul1
indicates that the Underline text is on
\sub0
indicates that the Subscript text is off
\sub1
indicates that the Subscript text is on
\superscript0
indicates that the Superscript text is off
\superscript1
indicates that the Superscript text is on
A space
A digit or hyphen (e.g. -23, 23, 275)
A character other than a digit or letter (e.g. \, /, }) [30]
{\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard
This is some {\b bold} text.\par
}
Character encoding
A standard RTF file can only consist of 7-bit ASCII characters, but can use escape
sequences to encode other characters.[31] The two character escapes are code page
escapes and, starting with RTF 1.5, Unicode escapes. In a code page escape, two
hexadecimal digits following a backslash and typewriter apostrophe denote a
character taken from a Windows code page. For example, if the code page is set to
Windows-1256, the sequence \'c8 will encode the Arabic letter bāʼ ب. It is also
possible to specify a "Character Set" in the preamble of the RTF document and
associate it to a header. For example, the preamble has the text \f3\fnil\
fcharset128, then, in the body of the document, the text \f3\'bd\'f0 will represent
the code point 0xbd 0xf0 from the Character Set 128 (which corresponds to the
Shift-JIS code page), which encodes "金".
RTF Character Set Code Page Description
0 Windows-1252 Latin alphabet, Western Europe / Americas
1 0 Default Windows API code page for system locale
2 42 Symbol (PUA-mapped)[32] character set
77 2 Default Macintosh-compatibility code page for system locale
128 Windows-932 Japanese, Shift JIS (Windows version)
129 Windows-949 Korean, Unified Hangul Code (extended Wansung)
130 Windows-1361 Korean, Johab (ASCII-based version)
134 Windows-936 Chinese, GBK (extended GB 2312)
136 Windows-950 Chinese, Big5
161 Windows-1253 Greek
162 Windows-1254 Latin alphabet, Turkish
163 Windows-1258 Latin alphabet, Vietnamese
177 Windows-1255 Hebrew
178 Windows-1256 Arabic
186 Windows-1257 Baltic
204 Windows-1251 Cyrillic
238 Windows-1250 Latin alphabet, Eastern Europe
255 1 Default OEM code page for system locale
For a Unicode escape, the control word \u is used, followed by a 16-bit signed
integer which corresponds to the Unicode UTF-16 code unit number. For the benefit
of programs without Unicode support, this must be followed by the nearest
representation of this character in the specified code page. For example, \u1576?
would give the Arabic letter bāʼ ب, but indicates that older programs which do not
support Unicode should render it as a question mark instead.
The control word \uc0 can be used to indicate that subsequent Unicode escape
sequences within the current group do not specify the substitution character.
Until RTF specification version 1.5 release in 1997, RTF only handled 7-bit
characters directly and 8-bit characters encoded as hexadecimal (using \'xx). Since
RTF 1.5, however, RTF control words generally accept signed 16-bit numbers as
arguments. Unicode values greater than 32767 must be expressed as negative numbers.
[25] If a Unicode character is outside BMP, it is encoded with a surrogate pair.
Support for Unicode was made due to text handling changes in Microsoft Word –
Microsoft Word 97 is a partially Unicode-enabled application and it handles text
using the 16-bit Unicode character encoding scheme.[25] Microsoft Word 2000 and
later versions are Unicode-enabled applications that handle text using the 16-bit
Unicode character encoding scheme.[3]
Because RTF files are usually 7-bit ASCII plain text, they can be easily
transmitted between PC-based operating systems. Converters that communicate with
Microsoft Word for MS Windows or Macintosh generally expect data transfer as 8-bit
characters and binary data which can contain any 8-bit values.[29]
Human readability
RTF is a data format for saving and sharing documents, not a markup language; it is
not intended for intuitive and easy typing.[33][34] Nonetheless, unlike many word
processing formats, RTF code can be human-readable. When an RTF file containing
mostly Latin characters without diacritics is viewed as a plain text file, the
underlying ASCII text is readable, provided that the author has kept formatting
concise.
When RTF was released, most word processors used binary file formats; Microsoft
Word, for example, used the .DOC file format. RTF was unique in its simple
formatting control which allowed non-RTF aware programs like Microsoft Notepad to
open and provide readable files. Today, most word processors have moved to XML-
based file formats (Word has switched to the .docx file format). Regardless, these
files contain large amounts of formatting code, so are often ten or more times
larger than the corresponding plain text.[35][33]
Most word processing software support either RTF format importing and exporting for
some RTF specification or direct editing, which makes it a "common" format between
otherwise incompatible word processing software and operating systems. Most
applications that read RTF files silently ignore unknown RTF control words.[36]
These factors contribute to its interoperability, though it is still dependent on
the specific RTF version in use.[7] There are several consciously designed or
accidentally born RTF dialects.[36]
RTF is the internal markup language used by Microsoft Word.[33] Since 1987, RTF
files have been able to be transferred back and forth between many old and new
computer systems (and now over the Internet), despite differences between operating
systems and their versions. This makes it a useful format for basic formatted text
documents such as instruction manuals, résumés, letters, and modest information
documents. These documents, at minimum, support bold, italic and underline text
formatting. Also typically supported are left-, center- and right-aligned text,
font specification and document margins.
Font and margin defaults, style presets and other functions vary according to
program defaults. There may also be incompatibilities between different RTF
versions, e.g. between RTF 1.0 1987 and later specifications, or between RTF 1.0–
1.4 and RTF 1.5+ in use of Unicode characters.[37][38][39] And though RTF supports
metadata like title and author, not all implementations support this. Nevertheless,
the RTF format is consistent enough to be considered highly portable and acceptable
for cross-platform use.
Objects
Microsoft Object Linking and Embedding (OLE) objects and Macintosh Edition Manager
subscriber objects allow embedding of other files inside the RTF, such as tables or
charts from spreadsheet application. However, since these objects are not widely
supported in programs for viewing or editing RTF files, they also limit RTF's
interoperability.[40][41][42][43][44] If software that understands a particular OLE
object is not available, the object is displayed using a picture of the object
which is embedded along with it.[45][46]
Pictures
RTF supports inclusion of JPEG, PNG, Enhanced Metafile (EMF), Windows Metafile
(WMF), Apple PICT, Windows device-dependent bitmap, Windows device-independent
bitmap and OS/2 Metafile picture types in hexadecimal (the default) or binary
format in a RTF file. Not all of these picture types are supported in all RTF
readers, however. When a RTF document is opened in software that does not support
the picture type of an inserted picture, the picture is not displayed. RTF writers
usually either convert an inserted picture in an unsupported picture type to one in
a supported picture type, or do not include picture at all.
For better compatibility with Microsoft products, some RTF writers include the same
picture in two different picture types in one RTF file: one supported picture type
to display, and one uncompressed WMF copy of the original picture to improve
compatibility with some Microsoft applications like Wordpad.[47]
This method increases the RTF file size dramatically. The RTF specification does
not require this method, and several implementations do not include the WMF copy
(e.g. Abiword or Ted).
RTF supports embedding of fonts used in the document, but this feature is not
widely supported in software implementations.[48][49][50]
RTF also supports generic font family names used for font substitution: roman
(serif), Swiss (sans-serif), modern (monospace), script, decorative and technical.
[19] This feature is not widely supported either.
Annotations
Since RTF 1.0, the RTF specification has supported document annotations/comments.
[19] The RTF 1.7 specification defined some new features for annotations, including
the date stamp (there was previously only "time stamp") and parents of annotations.
[27] When a RTF document with annotations is opened in an application that does not
support RTF annotations, the annotations are not shown. Similarly, when a document
with annotations is saved as RTF in an application that does not support RTF
annotations, the annotations are not preserved in the RTF file. Some
implementations, like Abiword (since version 2.8) and IBM Lotus Symphony (up to
version 1.3), may hide annotations by default or require some user action to
display them.
The RTF specification also supports footnotes, which are widely supported in RTF
implementations (e.g. in OpenOffice.org, Abiword, KWord, Ted, but not in Wordpad).
Endnotes are implemented as a variation on footnotes, so applications that support
footnotes but not endnotes will render an endnote as a footnote.
The RTF 1.2 specification defined use of drawing objects, known as shapes, such as
rectangles, ellipses, lines, arrows and polygons. The RTF 1.5 specification
introduced many new control words for drawing objects.[25]
Unlike Microsoft Word's DOC format, as well as the newer Office Open XML and
OpenDocument formats, RTF does not support macros. For this reason, RTF was often
recommended over those formats when the spread of computer viruses through macros
was a concern. However, having the .RTF extension does not guarantee a safe file,
since Microsoft Word will open standard DOC files renamed with an RTF extension and
run any contained macros as usual. Manual examination of a file in a plain text
editor such as Notepad, or use of the file command in a UNIX-like systems, is
required to determine whether or not a suspect file is really RTF.[9][56] Enabling
Word's "Confirm file format conversion on open" option can also assist by warning a
document being opened is in a format that does not match the format implied by the
file's extension, and giving the option to abort opening that file. One exploit
attacking a vulnerability was patched in Microsoft Word in April 2015.[57]
Since 2014 there have been malware RTF files embedding OpenXML exploits.[58]