Content-Length: 4619 | pFad | https://wg21.link/N2401
ISO/IEC JTC1/SC22/WG21 N2401 = J16/07-0261 Code Conversion Facets for the Standard C++ Library P.J. Plauger Dinkumware, Ltd. pjp@dinkumware.com 2007-09-03 With the acceptance of N2007 (Proposed Library Additions for Code Conversion) we now have template classes wbuffer_convert and wstring_convert, as well as basic_filebuf, that accept code-conversion facets as template parameters. Unfortunately, the current draft C++ Standard defines only the default codecvt facet, with weakly specified properties. This paper proposes the addition of several facets that provide the commonest Unicode support. Add the header <codecvt> with the following definitions: namespace std { enum codecvt_mode { consume_header = 4, generate_header = 2, little_endian = 1}; template<class Elem, unsigned long Maxcode = 0x10ffff, codecvt_mode Mode = (codecvt_mode)0> class codecvt_utf8 : public std::codecvt<Elem, char, mbstate_t> { // facet for converting between Elem and UTF-8 byte sequences ..... }; template<class Elem, unsigned long Maxcode = 0x10ffff, codecvt_mode Mode = (codecvt_mode)0> class codecvt_utf16 : public std::codecvt<Elem, char, mbstate_t> { // facet for converting between Elem and UTF-16 multibyte sequences ..... }; template<class Elem, unsigned long Maxcode = 0x10ffff, codecvt_mode Mode = (codecvt_mode)0> class codecvt_utf8_utf16 : public std::codecvt<Elem, char, mbstate_t> { // facet for converting between UTF-16 Elem and UTF-8 byte sequences ..... }; } // namespace std For each of the three code conversion facets codecvt_utf8, codecvt_utf16, and codecvt_utf8_utf16: -- Elem is the wide-character type, such as wchar_t, char16_t, or char32_t. -- Maxcode is the largest wide-character code that the facet will read or write without reporting a conversion error. -- If (Mode & consume_header), the facet consumes an optional initial header sequence when reading a multibyte sequence to determine the endianness of the subsequent multibyte sequence to be read. -- If (Mode & generate_header), the facet generates an initial header sequence when writing a multibyte sequence to advertise the endianness of the subsequent multibyte sequence to be written. -- If (Mode & little_endian), the facet generates a multibyte sequence in little-endian order, as opposed to the default big-endian order. For the facet codecvt_utf8: -- The facet converts between UTF-8 multibyte sequences and UCS2 or UCS4 (depending on the size of Elem) within the program. -- Endianness does not affect how multibyte sequences are read or written. -- The multibyte sequence can be written as either a text or a binary file. For the facet codecvt_utf16: -- The facet converts between UTF-16 multibyte sequences and UCS2 or UCS4 (depending on the size of Elem) within the program. -- Endianness affects how multibyte sequences are read or written. -- The multibyte sequence must be written as a binary file. For the facet codecvt_utf8_utf16: -- The facet converts between UTF-8 multibyte sequences and UTF-16 (one or two 16-bit codes) within the program. -- Endianness does not affect how multibyte sequences are read or written. -- The multibyte sequence can be written as eitier a text or a binary file.
Fetched URL: https://wg21.link/N2401
Alternative Proxies: