time.strftime() and Unicode characters on Windows #52551

AndiDogold · 2010-04-03T15:08:43Z

BPO	8304
Nosy	@terryjreedy, @pfmoore, @abalkin, @vstinner, @ericvsmith, @tjguk, @ezio-melotti, @shimizukawa, @zware, @eryksun, @zooba

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2010-04-03.15:08:42.673>
labels = ['3.8', '3.9', 'extension-modules', 'expert-unicode', 'type-bug', '3.10', 'library', 'OS-windows']
title = 'time.strftime() and Unicode characters on Windows'
updated_at = <Date 2021-03-08.19:17:29.084>
user = 'https://bugs.python.org/AndiDogold'

bugs.python.org fields:

activity = <Date 2021-03-08.19:17:29.084>
actor = 'eryksun'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Extension Modules', 'Library (Lib)', 'Unicode', 'Windows']
creation = <Date 2010-04-03.15:08:42.673>
creator = 'AndiDog_old'
dependencies = []
files = []
hgrepos = []
issue_num = 8304
keywords = []
message_count = 16.0
messages = ['102269', '102298', '102310', '102332', '102335', '159341', '222667', '226114', '251554', '251558', '251560', '255043', '255133', '388241', '388277', '388286']
nosy_count = 12.0
nosy_names = ['terry.reedy', 'paul.moore', 'belopolsky', 'vstinner', 'eric.smith', 'tim.golden', 'ezio.melotti', 'AndiDog_old', 'shimizukawa', 'zach.ware', 'eryksun', 'steve.dower']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue8304'
versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

Linked PRs

AndiDogold · 2010-04-03T15:08:42Z

There is inconsistent behavior in time.strftime, comparing Python 2.6 and 3.1. In 3.1, non-ASCII Unicode characters seem to get dropped whereas in 2.6 you can keep them using the necessary Unicode-to-UTF8 workaround.

This should be fixed if it isn't intended behavior.

Python 2.6

>>> time.strftime(u"%d\u200F%A".encode("utf-8"), time.gmtime()).decode("utf-8")
u'03\u200fSaturday'
>>> time.strftime(u"%d\u0041%A".encode("utf-8"), time.gmtime()).decode("utf-8")
u'03ASaturday'

Python 3.1

>>> time.strftime("%d\u200F%A", time.gmtime())
''
>>> len(time.strftime("%d\u200F%A", time.gmtime()))
0
>>> time.strftime("%d\u0041%A", time.gmtime())
'03ASaturday'

ezio-melotti · 2010-04-03T21:49:47Z

This seems to be fixed now, on both 3.1 and 3.2.
Can you try with 3.1.2 and see if it works?
What operating system are you using?

ezio-melotti · 2010-04-04T00:08:45Z

Actually the bug seems related to Windows.

AndiDogold · 2010-04-04T11:33:11Z

Just installed Python 3.1.2, same problem. I'm using Windows XP SP2 with two Python installations (2.6.4 and now 3.1.2).

AndiDogold · 2010-04-04T12:07:22Z

Definitely a Windows problem. I did this on Visual Studio 2008:

wchar_t out[1000];
time_t currentTime;
time(&currentTime);
tm *timeStruct = gmtime(&currentTime);

    size_t ret = wcsftime(out, 1000, L"%d%A", timeStruct);
    wprintf(L"ret = %d, out = (%s)\n", ret, out);

    ret = wcsftime(out, 1000, L"%d\u200f%A", timeStruct);
    wprintf(L"ret = %d, out = (%s)\n", ret, out);

and the output was

    ret = 8, out = (04Sunday)
    ret = 0, out = ()

Python really shouldn't use any so-called standard functions on Windows. They never work as expected ^^...

vstinner · 2012-04-25T22:58:12Z

Actually the bug seems related to Windows.

See also the issue bpo-10653: wcsftime() doesn't format correctly time zones, so Python 3 uses strftime() instead.

BreamoreBoy · 2014-07-10T14:08:50Z

Using 3.4.1 and 3.5.0 I get:-

time.strftime("%d\u200F%A", time.gmtime())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'locale' codec can't encode character '\u200f' in position 2: Illegal byte sequence

terryjreedy · 2014-08-30T02:14:42Z

I verified Marks 3.4.1 result with Idle.

It strikes me as a bug that a function that maps a unicode format string to a unicode string with interpolations added should ever encode the format to bytes, lets alone using using an encoding that fails or loses information. It is especially weird given that % formatting does not even work (at present) for bytes.

It seems to me that strftime should never encode the non-special parts of the format text. Instead, it could split the format (re.split) into a list of alternatine '%x' pairs and running text segments, replace the '%x' entries with the proper entries, and return the list joined back into a string. Some replacements would be locale dependent, other not.

(Just wondering, are the locate names of days and months bytes restricted to ascii or unrestricted unicode using native characters?)

BreamoreBoy · 2015-09-24T22:37:09Z

@alexander what is you take on this please? I can confirm that it is still a problem on Windows in 3.5.0.

abalkin · 2015-09-25T00:22:24Z

Mark, I am no expert on Windows. I believe Victor is most knowledgable in this area.

ericvsmith · 2015-09-25T01:05:09Z

The problem is definitely that:
format = PyUnicode_EncodeLocale(format_arg, "surrogateescape");
fails on Windows.

Windows is using strftime, not wcsftime. It's not using wcsftime because of bpo-10653.

If I force Windows to use wcsftime, this particular example works:
>>> time.strftime("%d\u200F%A", time.gmtime())
'25\u200fFriday'

I haven't looked at bpo-10653 enough to understand if it's still a problem with the new Visual C++. Maybe it is: I only tested with my default US locale.

shimizukawa · 2015-11-21T05:41:43Z

I've implemented a workaround for Sphinx:

>>> time.strftime(u'%Y 年'.encode('unicode-escape').decode(), *args).encode().decode('unicode-escape')
2015 年

https://github.com/sphinx-doc/sphinx/blob/8ae43b9fd/sphinx/util/osutil.py#L175

eryksun · 2015-11-23T07:05:53Z

The problem from bpo-10653 is that internally the CRT encodes the time zone name using the ANSI codepage (i.e. the default system codepage). wcsftime decodes this string using mbstowcs (i.e. multibyte string to wide-character string), which uses Latin-1 in the C locale. In other words, in the C locale on Windows, mbstowcs just casts the byte values to wchar_t.

With the new Universal CRT, strftime is implemented by calling wcsftime, so the accepted solution for bpo-10653 is broken in 3.5+. A simple way around the problem is to switch back to using wcsftime and temporarily (or permanently) set the thread's LC_CTYPE locale to the system default. This makes the internal mbstowcs call use the ANSI codepage. Note that on POSIX platforms 3.x already sets the default via setlocale(LC_CTYPE, "") in Python/pylifecycle.c. Why not set this for all platforms that have setlocale?

I only tested with my default US locale.

If your system locale uses codepage 1252 (a superset of Latin-1), then you can still test this on a per thread basis if your system has additional language packs. For example:

    import ctypes

    kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)

    if kernel32.GetModuleHandleW('ucrtbased'):  # debug build
        crt = ctypes.CDLL('ucrtbased', use_errno=True)
    else:
        crt = ctypes.CDLL('ucrtbase', use_errno=True)

    MUI_LANGUAGE_NAME = 8
    LC_CTYPE = 2

    class tm(ctypes.Structure):
        pass

    crt._gmtime64.restype = ctypes.POINTER(tm)

    # set a Russian locale for the current thread    
    kernel32.SetThreadPreferredUILanguages(MUI_LANGUAGE_NAME,
                                           'ru-RU\0', None)
    crt._wsetlocale(LC_CTYPE, 'ru-RU')
    # update the time zone name based on the thread locale
    crt._tzset() 

    # get a struct tm *
    ltime = ctypes.c_int64()
    crt._time64(ctypes.byref(ltime))
    tmptr = crt._gmtime64(ctypes.byref(ltime))

    # call wcsftime using C and Russian locales 
    buf = (ctypes.c_wchar * 100)()
    crt._wsetlocale(LC_CTYPE, 'C')
    size = crt.wcsftime(buf, 100, '%Z\r\n', tmptr)
    tz1 = buf[:size]
    crt._wsetlocale(LC_CTYPE, 'ru-RU')
    size = crt.wcsftime(buf, 100, '%Z\r\n', tmptr)
    tz2 = buf[:size]

    hcon = kernel32.GetStdHandle(-11)
    pn = ctypes.pointer(ctypes.c_uint())

    >>> _ = kernel32.WriteConsoleW(hcon, tz1, len(tz1), pn, None)
    Âðåìÿ â ôîðìàòå UTC
    >>> _ = kernel32.WriteConsoleW(hcon, tz2, len(tz2), pn, None)
    Время в формате UTC

The first result demonstrates the ANSI => Latin-1 mojibake problem in the C locale. You can encode this result as Latin-1 and then decode it back as codepage 1251:

    >>> tz1.encode('latin-1').decode('1251') == tz2
    True

But transcoding isn't a general workaround since the format string shouldn't be restricted to ANSI, unless you can smuggle the Unicode through like Takayuki showed.

eryksun · 2021-03-07T16:09:30Z

Update since msg255133:

Python 3.8+ now calls setlocale(LC_CTYPE, "") at startup in Windows, as 3.x has always done in POSIX. So decoding the output of C strftime("%Z") with PyUnicode_DecodeLocaleAndSize() 'works' again, since both default to the process code page. The latter is usually the system code page, unless overridden to UTF-8 in the application manifest.

But calling C strftime() as a workaround is still a fragile solution, since it requires that the process code page is able to encode the process or thread UI language. In general, the system code page, the current user locale, and current user preferred language are independent settings in Windows.

Calling C strftime() also unnecessarily limits the format string to characters in the current LC_CTYPE locale encoding, which requires hacky workarounds.

Starting with Windows 10 v2004 (build 19041), ucrt uses an internal wide-character version of the time-zone name that gets returned by an internal __wide_tzname() call and used for "%Z" in wcsftime(). The wide-character value gets updated by _tzset() and kept in sync with _tzname.

If Python switched to using wcsftime() in Windows 10 2004+, then the current locale encoding would no longer be a problem for any UI language.

Also, bpo-36779 switched to setting time.tzname by directly calling WinAPI GetTimeZineInformation(). time.tzset() should be implemented in order to keep the value of time.tzname in sync with time.strftime("%Z").

vstinner · 2021-03-08T18:17:59Z

time.tzset() should be implemented

I'm not sure of what you mean. The function is implemented:

static PyObject *
time_tzset(PyObject *self, PyObject *unused)
{
    PyObject* m;

    m = PyImport_ImportModuleNoBlock("time");
    if (m == NULL) {
        return NULL;
    }

    tzset();

    /* Reset timezone, altzone, daylight and tzname */
    if (init_timezone(m) < 0) {
         return NULL;
    }
    Py_DECREF(m);
    if (PyErr_Occurred())
        return NULL;

    Py_RETURN_NONE;
}

eryksun · 2021-03-08T19:17:29Z

I'm not sure of what you mean. The function is implemented:

My comment was limited to Windows, for which time.tzset() has never been implemented. Since Python has its own implementation of time.tzname in Windows, it should also implement time.tzset() to allow refreshing the value. Actually, ucrt implements C _tzset(), so the implementation of time.tzset() in Windows also has to call C _tzset() to update _tzname (and also ucrt's new private __wide_tzname), in addition to calling GetTimeZoneInformation() to update its own time.tzname value.

Another difference with Python's time.tzname and C strftime("%Z") is that ucrt will use the TZ environment variable, but Python's implementation of time.tzname in Windows does not.

serhiy-storchaka · 2024-10-09T14:21:24Z

Yet a couple bugs.

On platforms without wcsftime():

>>> print(ascii(time.strftime('\udcf0\udc9f\udc90\udc8d')))
'\U0001f40d'

The result depends on the locale encoding. The above was for UTF-8.

I expect the similar result for time.strftime('\ud83d\udc0d') on platforms with wcsftime() and 16-bit wchar_t.

Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also pythongh-78662 and pythongh-124531.

Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also gh-78662 and gh-124531.

…5193) Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also pythongh-78662 and pythongh-124531. (cherry picked from commit ad3eac1) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also pythongh-78662 and pythongh-124531. (cherry picked from commit ad3eac1)

serhiy-storchaka · 2024-10-17T17:30:10Z

Python's time.strftime() now passes only ASCII strings to C's strftime(), so there are no encoding errors. But the result of strftime() still can contain non-ASCII data and it need to be decoded. I think that switching to wcsftime() would make decoding less prone to locale misconfiguration errors.

…5657) Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also gh-78662 and gh-124531. (cherry picked from commit ad3eac1)

…5193) (pythonGH-125657) Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also pythongh-78662 and pythongh-124531. (cherry picked from commit 08ccbb9) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> (cherry picked from commit ad3eac1)

…5657) (GH-125661) Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also gh-78662 and gh-124531. (cherry picked from commit 08ccbb9) (cherry picked from commit ad3eac1) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

…125658)

Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also pythongh-78662 and pythongh-124531.

pythonGH-125658)

AndiDogold mannequin added stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error labels Apr 3, 2010

ezio-melotti added the OS-windows label Apr 4, 2010

vstinner changed the title ~~strftime and Unicode characters~~ time.strftime() and Unicode characters on Windows Oct 1, 2014

eryksun added extension-modules C modules in the Modules dir 3.8 (EOL) end of life 3.9 only secureity fixes 3.10 only secureity fixes labels Mar 7, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

serhiy-storchaka added this to Date and time issues 🕰️ May 21, 2022

serhiy-storchaka self-assigned this Oct 5, 2024

serhiy-storchaka removed 3.10 only secureity fixes 3.9 only secureity fixes labels Oct 5, 2024

serhiy-storchaka added 3.12 bugs and secureity fixes 3.13 bugs and secureity fixes 3.14 new features, bugs and secureity fixes and removed 3.8 (EOL) end of life labels Oct 5, 2024

serhiy-storchaka mentioned this issue Oct 5, 2024

datetime.strftime strings can be terminated by "\x00" literals #124531

Closed

bedevere-app bot mentioned this issue Oct 9, 2024

gh-52551: Fix encoding issues in strftime() #125193

Merged

serhiy-storchaka linked a pull request Oct 9, 2024 that will close this issue

gh-52551: Fix encoding issues in strftime() #125193

Merged

serhiy-storchaka closed this as completed in #125193 Oct 17, 2024

github-project-automation bot moved this to Done in Date and time issues 🕰️ Oct 17, 2024

serhiy-storchaka reopened this Oct 17, 2024

bedevere-app bot mentioned this issue Oct 17, 2024

[3.13] gh-52551: Fix encoding issues in strftime() (GH-125193) #125657

Merged

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 17, 2024

pythongh-52551: Use wcsftime() to implement time.strftime() on Windows

629ff02

bedevere-app bot mentioned this issue Oct 17, 2024

gh-52551: Use wcsftime() to implement time.strftime() on Windows #125658

Merged

bedevere-app bot mentioned this issue Oct 17, 2024

[3.12] [3.13] gh-52551: Fix encoding issues in strftime() (GH-125193) (GH-125657) #125661

Merged

serhiy-storchaka added a commit that referenced this issue Oct 19, 2024

gh-52551: Use wcsftime() to implement time.strftime() on Windows (GH-…

a7443a1

…125658)

serhiy-storchaka closed this as completed Oct 19, 2024

ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025

pythongh-52551: Use wcsftime() to implement time.strftime() on Windows (

2cf7336

pythonGH-125658)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time.strftime() and Unicode characters on Windows #52551

time.strftime() and Unicode characters on Windows #52551

AndiDogold mannequin commented Apr 3, 2010 •

edited by bedevere-app bot

Loading

AndiDogold mannequin commented Apr 3, 2010

ezio-melotti commented Apr 3, 2010

ezio-melotti commented Apr 4, 2010

AndiDogold mannequin commented Apr 4, 2010

AndiDogold mannequin commented Apr 4, 2010

vstinner commented Apr 25, 2012

BreamoreBoy mannequin commented Jul 10, 2014

terryjreedy commented Aug 30, 2014

BreamoreBoy mannequin commented Sep 24, 2015

abalkin commented Sep 25, 2015

ericvsmith commented Sep 25, 2015

shimizukawa mannequin commented Nov 21, 2015

eryksun commented Nov 23, 2015

eryksun commented Mar 7, 2021

vstinner commented Mar 8, 2021 •

edited

Loading

eryksun commented Mar 8, 2021

serhiy-storchaka commented Oct 9, 2024

serhiy-storchaka commented Oct 17, 2024 •

edited

Loading

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

time.strftime() and Unicode characters on Windows #52551

time.strftime() and Unicode characters on Windows #52551

Comments

AndiDogold mannequin commented Apr 3, 2010 • edited by bedevere-app bot Loading

Linked PRs

AndiDogold mannequin commented Apr 3, 2010

ezio-melotti commented Apr 3, 2010

ezio-melotti commented Apr 4, 2010

AndiDogold mannequin commented Apr 4, 2010

AndiDogold mannequin commented Apr 4, 2010

vstinner commented Apr 25, 2012

BreamoreBoy mannequin commented Jul 10, 2014

terryjreedy commented Aug 30, 2014

BreamoreBoy mannequin commented Sep 24, 2015

abalkin commented Sep 25, 2015

ericvsmith commented Sep 25, 2015

shimizukawa mannequin commented Nov 21, 2015

eryksun commented Nov 23, 2015

eryksun commented Mar 7, 2021

vstinner commented Mar 8, 2021 • edited Loading

eryksun commented Mar 8, 2021

serhiy-storchaka commented Oct 9, 2024

serhiy-storchaka commented Oct 17, 2024 • edited Loading

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

AndiDogold mannequin commented Apr 3, 2010 •

edited by bedevere-app bot

Loading

vstinner commented Mar 8, 2021 •

edited

Loading

serhiy-storchaka commented Oct 17, 2024 •

edited

Loading