Skip to content

Unicode characters ≥ 0x10000 cannot be inputted/behaves unusually at the REPL terminal. #136595

@haydenwong7bm

Description

@haydenwong7bm

Bug report

Bug description:

When the machine locale is set to UTF-8, when inputting a Unicode character ≥ 0x10000:
In CPython 3.13.5:
https://github.com/user-attachments/assets/7777b063-76fe-4929-b854-cae7d61807d2
In Cpython 3.14.0b4:

>>> Traceback (most recent call last):
  File "*\Python\Python314\Lib\_pyrepl\readline.py", line 394, in multiline_input
    return reader.readline()
           ~~~~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 748, in readline
    self.handle1()
    ~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 731, in handle1
    self.do_cmd(cmd)
    ~~~~~~~~~~~^^^^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 661, in do_cmd
    self.refresh()
    ~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 638, in refresh
    self.screen = self.calc_screen()
                  ~~~~~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\completing_reader.py", line 261, in calc_screen
    screen = super().calc_screen()
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 315, in calc_screen
    colors = list(gen_colors(self.get_unicode()))
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 108, in gen_colors
    for color in gen_colors_from_token_stream(gen, line_lengths):
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 168, in gen_colors_from_token_stream
    for prev_token, token, next_token in token_window:
                                         ^^^^^^^^^^^^
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 363, in prev_next_window
    window = deque((None, next(iterator)), maxlen=3)
                          ~~~~^^^^^^^^^^
  File "*\Python\Python314\Lib\tokenize.py", line 582, in _generate_tokens_from_c_tokenizer
    for info in it:
                ^^
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
>>> Traceback (most recent call last):
  File "*\Python\Python314\Lib\_pyrepl\readline.py", line 394, in multiline_input
    return reader.readline()
           ~~~~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 748, in readline
    self.handle1()
    ~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 731, in handle1
    self.do_cmd(cmd)
    ~~~~~~~~~~~^^^^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 661, in do_cmd
    self.refresh()
    ~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 638, in refresh
    self.screen = self.calc_screen()
                  ~~~~~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\completing_reader.py", line 261, in calc_screen
    screen = super().calc_screen()
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 315, in calc_screen
    colors = list(gen_colors(self.get_unicode()))
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 108, in gen_colors
    for color in gen_colors_from_token_stream(gen, line_lengths):
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 168, in gen_colors_from_token_stream
    for prev_token, token, next_token in token_window:
                                         ^^^^^^^^^^^^
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 363, in prev_next_window
    window = deque((None, next(iterator)), maxlen=3)
                          ~~~~^^^^^^^^^^
  File "*\Python\Python314\Lib\tokenize.py", line 582, in _generate_tokens_from_c_tokenizer
    for info in it:
                ^^
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed

Two surrogates were "inputted" and so two UnicodeEncodeErrors.

CPython versions tested on:

3.13, 3.14

Operating systems tested on:

Windows

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    OS-windowsstdlibPython modules in the Lib dirtopic-replRelated to the interactive shelltopic-unicodetype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy