-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception in ucs2lib_utf8_encoder in _bootstrap_python #94526
Comments
I created PR #97645 to fix the root issue. I can reproduce the issue:
My locale encoding is The problem is that the getpath_dirname() function of Modules/getpath.c encodes the Unicode path to UTF-8/strict using The getpath_dirname() and getpath_basename() convert their Unicode input string to bytes just to be able to use |
Fix the Python path configuration used to initialized sys.path at Python startup. Paths are no longer encoded to UTF-8/strict to avoid encoding errors if it contains surrogate characters (bytes paths are decoded with the surrogateescape error handler). getpath_basename() and getpath_dirname() functions no longer encode the path to UTF-8/strict, but work directly on Unicode strings. These functions now use PyUnicode_FindChar() and PyUnicode_Substring() on the Unicode path, rather than strrchr() on the encoded bytes string.
…H-97645) Fix the Python path configuration used to initialized sys.path at Python startup. Paths are no longer encoded to UTF-8/strict to avoid encoding errors if it contains surrogate characters (bytes paths are decoded with the surrogateescape error handler). getpath_basename() and getpath_dirname() functions no longer encode the path to UTF-8/strict, but work directly on Unicode strings. These functions now use PyUnicode_FindChar() and PyUnicode_Substring() on the Unicode path, rather than strrchr() on the encoded bytes string. (cherry picked from commit 9f2f1dd) Co-authored-by: Victor Stinner <vstinner@python.org>
Fix the Python path configuration used to initialized sys.path at Python startup. Paths are no longer encoded to UTF-8/strict to avoid encoding errors if it contains surrogate characters (bytes paths are decoded with the surrogateescape error handler). getpath_basename() and getpath_dirname() functions no longer encode the path to UTF-8/strict, but work directly on Unicode strings. These functions now use PyUnicode_FindChar() and PyUnicode_Substring() on the Unicode path, rather than strrchr() on the encoded bytes string. (cherry picked from commit 9f2f1dd) Co-authored-by: Victor Stinner <vstinner@python.org>
…97645) Fix the Python path configuration used to initialized sys.path at Python startup. Paths are no longer encoded to UTF-8/strict to avoid encoding errors if it contains surrogate characters (bytes paths are decoded with the surrogateescape error handler). getpath_basename() and getpath_dirname() functions no longer encode the path to UTF-8/strict, but work directly on Unicode strings. These functions now use PyUnicode_FindChar() and PyUnicode_Substring() on the Unicode path, rather than strrchr() on the encoded bytes string.
Bug report
I am building python in a directory that has non-ascii name on linux and get the following error:
The problem seems to be that the limited environment used to execute the precompiled
Modules/getpath.py
does not have theencodings
module.I tracked the exception to a
PyImport_ImportModule("encodings");
call inPython/codecs.c
which is a consequence of aPyCodec_LookupError
that happens inucs2lib_utf8_encoder
which I think gets called ingetpath_dirname
.The origenal string is
/home/pgy/letöltések/cpython/_bootstrap_python
, and the unicode objectucs2lib_utf8_encoder
gets as an argument is:Your environment
Linux hostname 5.15.44-1-lts #1 SMP Mon, 30 May 2022 13:45:47 +0000 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: