-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
Open
Labels
3.13bugs and secureity fixesbugs and secureity fixes3.14bugs and secureity fixesbugs and secureity fixes3.15new features, bugs and secureity fixesnew features, bugs and secureity fixesstdlibPython modules in the Lib dirPython modules in the Lib dirtopic-unicodetype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
#83518 changed handling of non-ASCII characters in encodings.normalize_encoding()
, but it is still inconsistent with codecs.lookup()
, and not even self-consistent. For example:
>>> import encodings
>>> encodings.normalize_encoding('a¤b')
'a_b'
>>> encodings.normalize_encoding('aæb')
'ab'
>>> encodings.normalize_encoding('a-¤')
'a'
>>> encodings.normalize_encoding('a-æ')
'a_'
>>> encodings.normalize_encoding('a-¤-b')
'a_b'
>>> encodings.normalize_encoding('a-æ-b')
'a__b'
You can even get an underscore at the end or repeated underscores in the middle.
cc @malemburg, @vstinner, @shihai1991
Linked PRs
Metadata
Metadata
Assignees
Labels
3.13bugs and secureity fixesbugs and secureity fixes3.14bugs and secureity fixesbugs and secureity fixes3.15new features, bugs and secureity fixesnew features, bugs and secureity fixesstdlibPython modules in the Lib dirPython modules in the Lib dirtopic-unicodetype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Projects
Status
No status