Inconsisten handling of non-ASCII characters in encodings.normalize_encoding()

# Bug report

#83518 changed handling of non-ASCII characters in `encodings.normalize_encoding()`, but it is still inconsistent with `codecs.lookup()`, and not even self-consistent. For example:

```pycon
>>> import encodings
>>> encodings.normalize_encoding('a¤b')
'a_b'
>>> encodings.normalize_encoding('aæb')
'ab'
>>> encodings.normalize_encoding('a-¤')
'a'
>>> encodings.normalize_encoding('a-æ')
'a_'
>>> encodings.normalize_encoding('a-¤-b')
'a_b'
>>> encodings.normalize_encoding('a-æ-b')
'a__b'
```

You can even get an underscore at the end or repeated underscores in the middle.

cc @malemburg, @vstinner, @shihai1991


### Linked PRs
* gh-136737

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inconsisten handling of non-ASCII characters in encodings.normalize_encoding() #136736

Bug report

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Uh oh!

Inconsisten handling of non-ASCII characters in encodings.normalize_encoding() #136736

Description

Bug report

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.