From d4381aa9bfc80fe0f3d9530bc32aba8df47caa07 Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 4 Jun 2025 18:01:25 +0200 Subject: [PATCH 1/9] WIP: String literals Co-authored-by: Blaise Pabon --- Doc/reference/expressions.rst | 50 +++++- Doc/reference/lexical_analysis.rst | 255 ++++++++++++++++++----------- 2 files changed, 207 insertions(+), 98 deletions(-) diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst index 17f39aaf5f57cd..743d43b1c9c1b1 100644 --- a/Doc/reference/expressions.rst +++ b/Doc/reference/expressions.rst @@ -133,13 +133,18 @@ Literals Python supports string and bytes literals and various numeric literals: -.. productionlist:: python-grammar - literal: `stringliteral` | `bytesliteral` | `NUMBER` +.. grammar-snippet:: + :group: python-grammar + + literal: `strings` | `NUMBER` Evaluation of a literal yields an object of the given type (string, bytes, integer, floating-point number, complex number) with the given value. The value may be approximated in the case of floating-point and imaginary (complex) -literals. See section :ref:`literals` for details. +literals. +See section :ref:`literals` for details. +Seee section :ref:`string-concatenation` for details on ``strings``. + .. index:: triple: immutable; data; type @@ -152,6 +157,45 @@ occurrence) may obtain the same object or a different object with the same value. +.. _string-concatenation: + +String literal concatenation +............................ + +Multiple adjacent string or bytes literals (delimited by whitespace), possibly +using different quoting conventions, are allowed, and their meaning is the same +as their concatenation. Thus, ``"hello" 'world'`` is equivalent to +``"helloworld"``. + +Formally: + +.. grammar-snippet:: + :group: python-grammar + + strings: ( `STRING` | `fstring` | `tstring`)+ + +Note that this feature is defined at the syntactical level, so it only works +with literals. +To concatenate string expressions at run time, the '+' operator may be used:: + + greeting = "Hello" + space = " " + name = "Blaise" + print(greeting + space + name) # not: print(greeting space name) + +Also note that literal concatenation can freely mix raw strings, +triple-quoted strings, and formatted or template string literals. +However, bytes literals may not be combined with string literals of any kind. + +This feature can be used to reduce the number of backslashes +needed, to split long strings conveniently across long lines, or even to add +comments to parts of strings, for example:: + + re.compile("[A-Za-z_]" # letter or underscore + "[A-Za-z0-9_]*" # letter, digit or underscore + ) + + .. _parenthesized: Parenthesized forms diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst index 567c70111c20ec..58c8b15cfe5499 100644 --- a/Doc/reference/lexical_analysis.rst +++ b/Doc/reference/lexical_analysis.rst @@ -106,6 +106,16 @@ If an encoding is declared, the encoding name must be recognized by Python encoding is used for all lexical analysis, including string literals, comments and identifiers. +All lexical analysis, including string literals, comments +and identifiers, works on Unicode text decoded using the source encoding. +Any Unicode code point, except the NUL control character, can appear in +Python source. + +.. grammar-snippet:: + :group: python-grammar + + source_character: + .. _explicit-joining: @@ -478,66 +488,104 @@ Literals are notations for constant values of some built-in types. .. index:: string literal, bytes literal, ASCII single: ' (single quote); string literal single: " (double quote); string literal - single: u'; string literal - single: u"; string literal .. _strings: String and Bytes literals ------------------------- -String literals are described by the following lexical definitions: +String literals are text enclosed in single quotes (``'``) or double +quotes (``"``). For example: -.. productionlist:: python-grammar - stringliteral: [`stringprefix`](`shortstring` | `longstring`) - stringprefix: "r" | "u" | "R" | "U" | "f" | "F" | "t" | "T" - : | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF" - : | "tr" | "Tr" | "tR" | "TR" | "rt" | "rT" | "Rt" | "RT" - shortstring: "'" `shortstringitem`* "'" | '"' `shortstringitem`* '"' - longstring: "'''" `longstringitem`* "'''" | '"""' `longstringitem`* '"""' - shortstringitem: `shortstringchar` | `stringescapeseq` - longstringitem: `longstringchar` | `stringescapeseq` - shortstringchar: - longstringchar: - stringescapeseq: "\" +.. code-block:: plain -.. productionlist:: python-grammar - bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`) - bytesprefix: "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB" - shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"' - longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""' - shortbytesitem: `shortbyteschar` | `bytesescapeseq` - longbytesitem: `longbyteschar` | `bytesescapeseq` - shortbyteschar: - longbyteschar: - bytesescapeseq: "\" + "spam" + 'eggs' + +The quote used to start the literal also terminates it, so a string literal +can only contain the other quote (except with escape sequences, see below). +For example: + +.. code-block:: plain -One syntactic restriction not indicated by these productions is that whitespace -is not allowed between the :token:`~python-grammar:stringprefix` or -:token:`~python-grammar:bytesprefix` and the rest of the literal. The source -character set is defined by the encoding declaration; it is UTF-8 if no encoding -declaration is given in the source file; see section :ref:`encodings`. + 'Say "Hello", please.' + "Don't do that!" -.. index:: triple-quoted string, Unicode Consortium, raw string +Except for this limitation, the choice of quote character (``'`` or ``"``) +does not affect how the literal is parsed. + +.. index:: triple-quoted string single: """; string literal single: '''; string literal -In plain English: Both types of literals can be enclosed in matching single quotes -(``'``) or double quotes (``"``). They can also be enclosed in matching groups -of three single or double quotes (these are generally referred to as -*triple-quoted strings*). The backslash (``\``) character is used to give special -meaning to otherwise ordinary characters like ``n``, which means 'newline' when -escaped (``\n``). It can also be used to escape characters that otherwise have a -special meaning, such as newline, backslash itself, or the quote character. -See :ref:`escape sequences ` below for examples. +Triple-quoted strings +--------------------- + +Strings can also be enclosed in matching groups of three single or double +quotes. +These are generally referred to as :dfn:`triple-quoted strings`. + +In triple-quoted literals, unescaped newlines and quotes are allowed (and are +retained), except that three unescaped quotes in a row terminate the literal. +(Here, a *quote* is the character used to open the literal, that is, +either ``'`` or ``"``.) + +For example: + +.. code-block:: plain + + """This is a triple-quoted string with "quotes" inside.""" + + '''Another triple-quoted string. This one continues + on the next line.''' + +Escape sequences +---------------- + +Inside a string literal, the backslash (``\``) character introduces an +:dfn:`escape sequence`, which has special meaning depending on the character +after the backslash. +For example, ``\n`` denotes the 'newline' character, rather the two characters +``\`` and ``n``. +See :ref:`escape sequences ` below for a full list of such +sequences, and more details. + + +.. index:: + single: u'; string literal + single: u"; string literal + +String prefixes +--------------- + +String literals can have an optional :dfn:`prefix` that influences how the literal +is parsed, for example: + +.. code-block:: plain + + b"data" + f'{result=}' + +* ``r``: Raw string +* ``f``: "F-string" +* ``t``: "T-string" +* ``b``: Byte literal +* ``u``: No effect (allowed for backwards compatibility) + +Prefixes are case-insensitive (for example, ``B`` works the same as ``b``). +The ``r`` prefix can be combined with ``f``, ``t`` or ``b``, so ``fr``, +``rf``, ``tr``, ``rt``, ``br`` and ``rb`` are also valid prefixes. + .. index:: single: b'; bytes literal single: b"; bytes literal -Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an -instance of the :class:`bytes` type instead of the :class:`str` type. They -may only contain ASCII characters; bytes with a numeric value of 128 or greater -must be expressed with escapes. +:dfn:`Bytes literals` are always prefixed with ``'b'`` or ``'B'``; they produce an +instance of the :class:`bytes` type instead of the :class:`str` type. +They may only contain ASCII characters; bytes with a numeric value of 128 +or greater must be expressed with escape sequences. +Similarly, a zero byte must be expressed using an escape sequence. + .. index:: single: r'; raw string literal @@ -546,9 +594,33 @@ must be expressed with escapes. Both string and bytes literals may optionally be prefixed with a letter ``'r'`` or ``'R'``; such constructs are called :dfn:`raw string literals` and :dfn:`raw bytes literals` respectively and treat backslashes as -literal characters. As a result, in raw string literals, ``'\U'`` and ``'\u'`` +literal characters. +As a result, in raw string literals, :ref:`escape sequences ` escapes are not treated specially. +Even in a raw literal, quotes can be escaped with a backslash, but the +backslash remains in the result; for example, ``r"\""`` is a valid string +literal consisting of two characters: a backslash and a double quote; ``r"\"`` +is not a valid string literal (even a raw string cannot end in an odd number of +backslashes). Specifically, *a raw literal cannot end in a single backslash* +(since the backslash would escape the following quote character). Note also +that a single backslash followed by a newline is interpreted as those two +characters as part of the literal, *not* as a line continuation. + + +.. index:: + single: f'; formatted string literal + single: f"; formatted string literal + +A string literal with ``'f'`` or ``'F'`` in its prefix is a +:dfn:`formatted string literal`; see :ref:`f-strings`. +Similarly, string literal with ``'t'`` or ``'T'`` in its prefix is a +:dfn:`template string literal`; see :ref:`t-strings`. + +The ``'f'`` or ``t`` may be combined with ``'r'`` to create a +:dfn:`raw formatted string` or :dfn:`raw template string`. +They may not be combined with ``'b'``, ``'u'``, or each other. + .. versionadded:: 3.3 The ``'rb'`` prefix of raw bytes literals has been added as a synonym of ``'br'``. @@ -557,18 +629,46 @@ escapes are not treated specially. to simplify the maintenance of dual Python 2.x and 3.x codebases. See :pep:`414` for more information. -.. index:: - single: f'; formatted string literal - single: f"; formatted string literal -A string literal with ``'f'`` or ``'F'`` in its prefix is a -:dfn:`formatted string literal`; see :ref:`f-strings`. The ``'f'`` may be -combined with ``'r'``, but not with ``'b'`` or ``'u'``, therefore raw -formatted strings are possible, but formatted bytes literals are not. +String literals, except "F-strings" and "T-strings", are described by the +following lexical definitions: + +.. grammar-snippet:: + :group: python-grammar + + STRING: stringliteral | bytesliteral | fstring | tstring + + stringliteral: [`stringprefix`](`stringcontent`) + stringprefix: <("r" | "u"), case-insensitive> + stringcontent: `quote` `stringitem`* + quote: "'" | '"' | "'''" | '"""' + stringitem: `stringchar` | `stringescapeseq` + stringchar: + stringescapeseq: "\" + +``stringchar`` can not include: + +- the backslash, ``\``; +- in triple-quoted strings (quoted by ``'''`` or ``"""``), the newline; +- the quote character. + + +.. grammar-snippet:: + :group: python-grammar + + bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`) + bytesprefix: <("b" | "br" | "rb" ), case-insensitive> + shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"' + longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""' + shortbytesitem: `shortbyteschar` | `bytesescapeseq` + longbytesitem: `longbyteschar` | `bytesescapeseq` + shortbyteschar: + longbyteschar: + bytesescapeseq: "\" + +Note that as in all lexical definitions, whitespace is significant. +The prefix, if any, must be followed immediately by the quoted string content. -In triple-quoted literals, unescaped newlines and quotes are allowed (and are -retained), except that three unescaped quotes in a row terminate the literal. (A -"quote" is the character used to open the literal, i.e. either ``'`` or ``"``.) .. index:: physical line, escape sequence, Standard C, C single: \ (backslash); escape sequence @@ -587,7 +687,6 @@ retained), except that three unescaped quotes in a row terminate the literal. ( .. _escape-sequences: - Escape sequences ^^^^^^^^^^^^^^^^ @@ -655,14 +754,14 @@ Notes: (2) - As in Standard C, up to three octal digits are accepted. + As in Standard C, up to three octal digits (0 through 7) are accepted. .. versionchanged:: 3.11 - Octal escapes with value larger than ``0o377`` produce a + Octal escapes with value larger than ``0o377`` (255) produce a :exc:`DeprecationWarning`. .. versionchanged:: 3.12 - Octal escapes with value larger than ``0o377`` produce a + Octal escapes with value larger than ``0o377`` (255) produce a :exc:`SyntaxWarning`. In a future Python version they will be eventually a :exc:`SyntaxError`. @@ -689,11 +788,9 @@ Notes: .. index:: unrecognized escape sequence Unlike Standard C, all unrecognized escape sequences are left in the string -unchanged, i.e., *the backslash is left in the result*. (This behavior is -useful when debugging: if an escape sequence is mistyped, the resulting output -is more easily recognized as broken.) It is also important to note that the -escape sequences only recognized in string literals fall into the category of -unrecognized escapes for bytes literals. +unchanged, i.e., *the backslash is left in the result*. +Note that for bytes literals, the escape sequences only recognized in string +literals fall into the category of unrecognized escapes. .. versionchanged:: 3.6 Unrecognized escape sequences produce a :exc:`DeprecationWarning`. @@ -702,38 +799,6 @@ unrecognized escapes for bytes literals. Unrecognized escape sequences produce a :exc:`SyntaxWarning`. In a future Python version they will be eventually a :exc:`SyntaxError`. -Even in a raw literal, quotes can be escaped with a backslash, but the -backslash remains in the result; for example, ``r"\""`` is a valid string -literal consisting of two characters: a backslash and a double quote; ``r"\"`` -is not a valid string literal (even a raw string cannot end in an odd number of -backslashes). Specifically, *a raw literal cannot end in a single backslash* -(since the backslash would escape the following quote character). Note also -that a single backslash followed by a newline is interpreted as those two -characters as part of the literal, *not* as a line continuation. - - -.. _string-concatenation: - -String literal concatenation ----------------------------- - -Multiple adjacent string or bytes literals (delimited by whitespace), possibly -using different quoting conventions, are allowed, and their meaning is the same -as their concatenation. Thus, ``"hello" 'world'`` is equivalent to -``"helloworld"``. This feature can be used to reduce the number of backslashes -needed, to split long strings conveniently across long lines, or even to add -comments to parts of strings, for example:: - - re.compile("[A-Za-z_]" # letter or underscore - "[A-Za-z0-9_]*" # letter, digit or underscore - ) - -Note that this feature is defined at the syntactical level, but implemented at -compile time. The '+' operator must be used to concatenate string expressions -at run time. Also note that literal concatenation can use different quoting -styles for each component (even mixing raw strings and triple quoted strings), -and formatted string literals may be concatenated with plain string literals. - .. index:: single: formatted string literal From 80ad85cc286f04a4ac19d03c5f99a9158d15231b Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 11 Jun 2025 16:22:08 +0200 Subject: [PATCH 2/9] Use correct Pygments lexer for plain text --- Doc/reference/lexical_analysis.rst | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst index 58c8b15cfe5499..6f3d90f89b98d3 100644 --- a/Doc/reference/lexical_analysis.rst +++ b/Doc/reference/lexical_analysis.rst @@ -496,7 +496,11 @@ String and Bytes literals String literals are text enclosed in single quotes (``'``) or double quotes (``"``). For example: -.. code-block:: plain +.. This is Python code, but we turn off highlighting because as of this + writing, highlighted strings don't look good when there's no code + surrounding them. + +.. code-block:: text "spam" 'eggs' @@ -505,7 +509,7 @@ The quote used to start the literal also terminates it, so a string literal can only contain the other quote (except with escape sequences, see below). For example: -.. code-block:: plain +.. code-block:: text 'Say "Hello", please.' "Don't do that!" @@ -531,7 +535,7 @@ either ``'`` or ``"``.) For example: -.. code-block:: plain +.. code-block:: text """This is a triple-quoted string with "quotes" inside.""" @@ -560,7 +564,7 @@ String prefixes String literals can have an optional :dfn:`prefix` that influences how the literal is parsed, for example: -.. code-block:: plain +.. code-block:: python b"data" f'{result=}' From e44fa66cf2da63763a3ed37f7d59da28e95c785c Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 11 Jun 2025 17:59:01 +0200 Subject: [PATCH 3/9] WIP --- Doc/reference/grammar.rst | 5 +- Doc/reference/introduction.rst | 16 +++-- Doc/reference/lexical_analysis.rst | 110 +++++++++++++++++------------ 3 files changed, 76 insertions(+), 55 deletions(-) diff --git a/Doc/reference/grammar.rst b/Doc/reference/grammar.rst index 55c148801d8559..1037feb691f6bc 100644 --- a/Doc/reference/grammar.rst +++ b/Doc/reference/grammar.rst @@ -10,11 +10,8 @@ error recovery. The notation used here is the same as in the preceding docs, and is described in the :ref:`notation ` section, -except for a few extra complications: +except for an extra complication: -* ``&e``: a positive lookahead (that is, ``e`` is required to match but - not consumed) -* ``!e``: a negative lookahead (that is, ``e`` is required *not* to match) * ``~`` ("cut"): commit to the current alternative and fail the rule even if this fails to parse diff --git a/Doc/reference/introduction.rst b/Doc/reference/introduction.rst index 444acac374a690..c62240b18cfe55 100644 --- a/Doc/reference/introduction.rst +++ b/Doc/reference/introduction.rst @@ -145,15 +145,23 @@ The definition to the right of the colon uses the following syntax elements: * ``e?``: A question mark has exactly the same meaning as square brackets: the preceding item is optional. * ``(e)``: Parentheses are used for grouping. + +The following notation is only used in +:ref:`lexical definitions `. + * ``"a"..."z"``: Two literal characters separated by three dots mean a choice of any single character in the given (inclusive) range of ASCII characters. - This notation is only used in - :ref:`lexical definitions `. * ``<...>``: A phrase between angular brackets gives an informal description of the matched symbol (for example, ````), or an abbreviation that is defined in nearby text (for example, ````). - This notation is only used in - :ref:`lexical definitions `. + +.. _lexical-lookaheads: + +Some definitions also use *lookaheads*, which indicate that an element +must (or must not) match at a given position, but without consuming any input: + +* ``&e``: a positive lookahead (that is, ``e`` is required to match) +* ``!e``: a negative lookahead (that is, ``e`` is required *not* to match) The unary operators (``*``, ``+``, ``?``) bind as tightly as possible; the vertical bar (``|``) binds most loosely. diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst index 6f3d90f89b98d3..67cc9bd8fc7bac 100644 --- a/Doc/reference/lexical_analysis.rst +++ b/Doc/reference/lexical_analysis.rst @@ -39,7 +39,8 @@ The end of a logical line is represented by the token :data:`~token.NEWLINE`. Statements cannot cross logical line boundaries except where :data:`!NEWLINE` is allowed by the syntax (e.g., between statements in compound statements). A logical line is constructed from one or more *physical lines* by following -the explicit or implicit *line joining* rules. +the :ref:`explicit ` or :ref:`implicit ` +*line joining* rules. .. _physical-lines: @@ -47,17 +48,28 @@ the explicit or implicit *line joining* rules. Physical lines -------------- -A physical line is a sequence of characters terminated by an end-of-line -sequence. In source files and strings, any of the standard platform line -termination sequences can be used - the Unix form using ASCII LF (linefeed), -the Windows form using the ASCII sequence CR LF (return followed by linefeed), -or the old Macintosh form using the ASCII CR (return) character. All of these -forms can be used equally, regardless of platform. The end of input also serves -as an implicit terminator for the final physical line. +A physical line is a sequence of characters terminated by one the following +end-of-line sequences: -When embedding Python, source code strings should be passed to Python APIs using -the standard C conventions for newline characters (the ``\n`` character, -representing ASCII LF, is the line terminator). +* the Unix form using ASCII LF (linefeed), +* the Windows form using the ASCII sequence CR LF (return followed by linefeed), +* the old Macintosh form using the ASCII CR (return) character. + +Regardless of platform, each of these sequences is replaced by a single +ASCII LF (linefeed) character. +(This is done even inside :ref:`string literals `.) +Each line can use any of the sequences; they do not need to be consistent +within a file. + +The end of input also serves as an implicit terminator for the final +physical line. + +Formally: + +.. grammar-snippet:: + :group: python-grammar + + newline: | | .. _comments: @@ -484,6 +496,13 @@ Literals Literals are notations for constant values of some built-in types. +In terms of lexical analysis, Python has :ref:`string, bytes ` +and :ref:`numeric ` literals. + +Other “literals” are lexically denoted using :ref:`keywords ` +(``None``, ``True``, ``False``) and the special +:ref:`ellipsis token ` (``...``): + .. index:: string literal, bytes literal, ASCII single: ' (single quote); string literal @@ -491,7 +510,7 @@ Literals are notations for constant values of some built-in types. .. _strings: String and Bytes literals -------------------------- +========================= String literals are text enclosed in single quotes (``'``) or double quotes (``"``). For example: @@ -635,41 +654,26 @@ They may not be combined with ``'b'``, ``'u'``, or each other. String literals, except "F-strings" and "T-strings", are described by the -following lexical definitions: +following lexical definitions. + +These definitions use :ref:`negative lookaheads ` (``!``) +to indicate that an ending quote ends the literal. .. grammar-snippet:: :group: python-grammar - STRING: stringliteral | bytesliteral | fstring | tstring - - stringliteral: [`stringprefix`](`stringcontent`) - stringprefix: <("r" | "u"), case-insensitive> - stringcontent: `quote` `stringitem`* - quote: "'" | '"' | "'''" | '"""' + STRING: [`stringprefix`] (`stringcontent`) + stringprefix: <("r" | "u" | "b" | "br" | "rb"), case-insensitive> + stringcontent: + | "'" ( !"'" `stringitem`)* "'" + | '"' ( !'"' `stringitem`)* '"' + | "'''" ( !"'''" `longstringitem`)* "'''" + | '"""' ( !'"""' `longstringitem`)* '"""' stringitem: `stringchar` | `stringescapeseq` - stringchar: + stringchar: + longstringitem: `stringitem` | newline stringescapeseq: "\" -``stringchar`` can not include: - -- the backslash, ``\``; -- in triple-quoted strings (quoted by ``'''`` or ``"""``), the newline; -- the quote character. - - -.. grammar-snippet:: - :group: python-grammar - - bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`) - bytesprefix: <("b" | "br" | "rb" ), case-insensitive> - shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"' - longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""' - shortbytesitem: `shortbyteschar` | `bytesescapeseq` - longbytesitem: `longbyteschar` | `bytesescapeseq` - shortbyteschar: - longbyteschar: - bytesescapeseq: "\" - Note that as in all lexical definitions, whitespace is significant. The prefix, if any, must be followed immediately by the quoted string content. @@ -692,7 +696,7 @@ The prefix, if any, must be followed immediately by the quoted string content. .. _escape-sequences: Escape sequences -^^^^^^^^^^^^^^^^ +---------------- Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in string and bytes literals are interpreted according to rules similar to those used by @@ -985,7 +989,7 @@ and :meth:`str.format`, which uses a related format string mechanism. .. _numbers: Numeric literals ----------------- +================ .. index:: number, numeric literal, integer literal floating-point literal, hexadecimal literal @@ -1241,14 +1245,26 @@ The following tokens serve as delimiters in the grammar: ( ) [ ] { } , : ! . ; @ = + +The period can also occur in floating-point and imaginary literals. + +.. _lexical-ellipsis: + +A sequence of three periods has a special meaning as an +:py:data:`Ellipsis` literal: + +.. code-block:: none + + ... + +The following *augmented assignment operators* serve +lexically as delimiters, but also perform an operation: + +.. code-block:: none + -> += -= *= /= //= %= @= &= |= ^= >>= <<= **= -The period can also occur in floating-point and imaginary literals. A sequence -of three periods has a special meaning as an ellipsis literal. The second half -of the list, the augmented assignment operators, serve lexically as delimiters, -but also perform an operation. - The following printing ASCII characters have special meaning as part of other tokens or are otherwise significant to the lexical analyzer: From 86bf94b0f4cc9f9eaa63728610d7bb71fc4f3107 Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 18 Jun 2025 18:05:31 +0200 Subject: [PATCH 4/9] More WIP --- Doc/reference/lexical_analysis.rst | 424 +++++++++++++++++------------ 1 file changed, 251 insertions(+), 173 deletions(-) diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst index 67cc9bd8fc7bac..36abfa31c093c9 100644 --- a/Doc/reference/lexical_analysis.rst +++ b/Doc/reference/lexical_analysis.rst @@ -501,7 +501,7 @@ and :ref:`numeric ` literals. Other “literals” are lexically denoted using :ref:`keywords ` (``None``, ``True``, ``False``) and the special -:ref:`ellipsis token ` (``...``): +:ref:`ellipsis token ` (``...``). .. index:: string literal, bytes literal, ASCII @@ -519,7 +519,7 @@ quotes (``"``). For example: writing, highlighted strings don't look good when there's no code surrounding them. -.. code-block:: text +.. code-block:: python "spam" 'eggs' @@ -528,7 +528,7 @@ The quote used to start the literal also terminates it, so a string literal can only contain the other quote (except with escape sequences, see below). For example: -.. code-block:: text +.. code-block:: python 'Say "Hello", please.' "Don't do that!" @@ -536,6 +536,21 @@ For example: Except for this limitation, the choice of quote character (``'`` or ``"``) does not affect how the literal is parsed. +Inside a string literal, the backslash (``\``) character introduces an +:dfn:`escape sequence`, which has special meaning depending on the character +after the backslash. +For example, ``\"`` denotes the double quote character, and does *not* end +the string: + +.. code-block:: python + + >>> print("Say \"Hello\" to everyone!") + Say "Hello" to everyone! + +See :ref:`escape sequences ` below for a full list of such +sequences, and more details. + + .. index:: triple-quoted string single: """; string literal single: '''; string literal @@ -545,32 +560,20 @@ Triple-quoted strings Strings can also be enclosed in matching groups of three single or double quotes. -These are generally referred to as :dfn:`triple-quoted strings`. +These are generally referred to as :dfn:`triple-quoted strings`:: -In triple-quoted literals, unescaped newlines and quotes are allowed (and are -retained), except that three unescaped quotes in a row terminate the literal. -(Here, a *quote* is the character used to open the literal, that is, -either ``'`` or ``"``.) + """This is a triple-quoted string.""" -For example: +In triple-quoted literals, unescaped quotes are allowed (and are +retained), except that three unescaped quotes in a row terminate the literal, +if they are of the same kind (``'`` or ``"``) used at the start:: -.. code-block:: text + """This string has "quotes" inside.""" - """This is a triple-quoted string with "quotes" inside.""" +Unescaped newlines are also allowed and retained:: - '''Another triple-quoted string. This one continues - on the next line.''' - -Escape sequences ----------------- - -Inside a string literal, the backslash (``\``) character introduces an -:dfn:`escape sequence`, which has special meaning depending on the character -after the backslash. -For example, ``\n`` denotes the 'newline' character, rather the two characters -``\`` and ``n``. -See :ref:`escape sequences ` below for a full list of such -sequences, and more details. + '''This triple-quoted string + continues on the next line.''' .. index:: @@ -580,70 +583,28 @@ sequences, and more details. String prefixes --------------- -String literals can have an optional :dfn:`prefix` that influences how the literal -is parsed, for example: +String literals can have an optional :dfn:`prefix` that influences how the +content of the literal is parsed, for example: .. code-block:: python b"data" f'{result=}' -* ``r``: Raw string -* ``f``: "F-string" -* ``t``: "T-string" -* ``b``: Byte literal +The allowed prefixes are: + +* ``b``: :ref:`Bytes literal ` +* ``r``: :ref:`Raw string ` +* ``f``: :ref:`Formatted string literal ` ("f-string") +* ``t``: :ref:`Template string literal ` ("t-string") * ``u``: No effect (allowed for backwards compatibility) +See the linked sections for details on each type. + Prefixes are case-insensitive (for example, ``B`` works the same as ``b``). The ``r`` prefix can be combined with ``f``, ``t`` or ``b``, so ``fr``, ``rf``, ``tr``, ``rt``, ``br`` and ``rb`` are also valid prefixes. - -.. index:: - single: b'; bytes literal - single: b"; bytes literal - -:dfn:`Bytes literals` are always prefixed with ``'b'`` or ``'B'``; they produce an -instance of the :class:`bytes` type instead of the :class:`str` type. -They may only contain ASCII characters; bytes with a numeric value of 128 -or greater must be expressed with escape sequences. -Similarly, a zero byte must be expressed using an escape sequence. - - -.. index:: - single: r'; raw string literal - single: r"; raw string literal - -Both string and bytes literals may optionally be prefixed with a letter ``'r'`` -or ``'R'``; such constructs are called :dfn:`raw string literals` -and :dfn:`raw bytes literals` respectively and treat backslashes as -literal characters. -As a result, in raw string literals, :ref:`escape sequences ` -escapes are not treated specially. - -Even in a raw literal, quotes can be escaped with a backslash, but the -backslash remains in the result; for example, ``r"\""`` is a valid string -literal consisting of two characters: a backslash and a double quote; ``r"\"`` -is not a valid string literal (even a raw string cannot end in an odd number of -backslashes). Specifically, *a raw literal cannot end in a single backslash* -(since the backslash would escape the following quote character). Note also -that a single backslash followed by a newline is interpreted as those two -characters as part of the literal, *not* as a line continuation. - - -.. index:: - single: f'; formatted string literal - single: f"; formatted string literal - -A string literal with ``'f'`` or ``'F'`` in its prefix is a -:dfn:`formatted string literal`; see :ref:`f-strings`. -Similarly, string literal with ``'t'`` or ``'T'`` in its prefix is a -:dfn:`template string literal`; see :ref:`t-strings`. - -The ``'f'`` or ``t`` may be combined with ``'r'`` to create a -:dfn:`raw formatted string` or :dfn:`raw template string`. -They may not be combined with ``'b'``, ``'u'``, or each other. - .. versionadded:: 3.3 The ``'rb'`` prefix of raw bytes literals has been added as a synonym of ``'br'``. @@ -653,7 +614,11 @@ They may not be combined with ``'b'``, ``'u'``, or each other. See :pep:`414` for more information. -String literals, except "F-strings" and "T-strings", are described by the +Formal grammar +-------------- + +String literals, except :ref:`"F-strings" ` and +:ref:`"T-strings" `, are described by the following lexical definitions. These definitions use :ref:`negative lookaheads ` (``!``) @@ -675,23 +640,8 @@ to indicate that an ending quote ends the literal. stringescapeseq: "\" Note that as in all lexical definitions, whitespace is significant. -The prefix, if any, must be followed immediately by the quoted string content. - - -.. index:: physical line, escape sequence, Standard C, C - single: \ (backslash); escape sequence - single: \\; escape sequence - single: \a; escape sequence - single: \b; escape sequence - single: \f; escape sequence - single: \n; escape sequence - single: \r; escape sequence - single: \t; escape sequence - single: \v; escape sequence - single: \x; escape sequence - single: \N; escape sequence - single: \u; escape sequence - single: \U; escape sequence +In particular, the prefix (if any) must be immediately followed by the starting +quote. .. _escape-sequences: @@ -702,55 +652,50 @@ Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in string and bytes literals are interpreted according to rules similar to those used by Standard C. The recognized escape sequences are: -+-------------------------+---------------------------------+-------+ -| Escape Sequence | Meaning | Notes | -+=========================+=================================+=======+ -| ``\``\ | Backslash and newline ignored | \(1) | -+-------------------------+---------------------------------+-------+ -| ``\\`` | Backslash (``\``) | | -+-------------------------+---------------------------------+-------+ -| ``\'`` | Single quote (``'``) | | -+-------------------------+---------------------------------+-------+ -| ``\"`` | Double quote (``"``) | | -+-------------------------+---------------------------------+-------+ -| ``\a`` | ASCII Bell (BEL) | | -+-------------------------+---------------------------------+-------+ -| ``\b`` | ASCII Backspace (BS) | | -+-------------------------+---------------------------------+-------+ -| ``\f`` | ASCII Formfeed (FF) | | -+-------------------------+---------------------------------+-------+ -| ``\n`` | ASCII Linefeed (LF) | | -+-------------------------+---------------------------------+-------+ -| ``\r`` | ASCII Carriage Return (CR) | | -+-------------------------+---------------------------------+-------+ -| ``\t`` | ASCII Horizontal Tab (TAB) | | -+-------------------------+---------------------------------+-------+ -| ``\v`` | ASCII Vertical Tab (VT) | | -+-------------------------+---------------------------------+-------+ -| :samp:`\\\\{ooo}` | Character with octal value | (2,4) | -| | *ooo* | | -+-------------------------+---------------------------------+-------+ -| :samp:`\\x{hh}` | Character with hex value *hh* | (3,4) | -+-------------------------+---------------------------------+-------+ - -Escape sequences only recognized in string literals are: - -+-------------------------+---------------------------------+-------+ -| Escape Sequence | Meaning | Notes | -+=========================+=================================+=======+ -| :samp:`\\N\\{{name}\\}` | Character named *name* in the | \(5) | -| | Unicode database | | -+-------------------------+---------------------------------+-------+ -| :samp:`\\u{xxxx}` | Character with 16-bit hex value | \(6) | -| | *xxxx* | | -+-------------------------+---------------------------------+-------+ -| :samp:`\\U{xxxxxxxx}` | Character with 32-bit hex value | \(7) | -| | *xxxxxxxx* | | -+-------------------------+---------------------------------+-------+ - -Notes: - -(1) +.. list-table:: + :widths: auto + :header-rows: 1 + + * * Escape Sequence + * Meaning + * * ``\``\ + * :ref:`string-escape-ignore` + * * ``\\`` + * :ref:`Backslash ` + * * ``\'`` + * :ref:`Single quote ` + * * ``\"`` + * :ref:`Double quote ` + * * ``\a`` + * ASCII Bell (BEL) + * * ``\b`` + * ASCII Backspace (BS) + * * ``\f`` + * ASCII Formfeed (FF) + * * ``\n`` + * ASCII Linefeed (LF) + * * ``\r`` + * ASCII Carriage Return (CR) + * * ``\t`` + * ASCII Horizontal Tab (TAB) + * * ``\v`` + * ASCII Vertical Tab (VT) + * * :samp:`\\\\{ooo}` + * :ref:`string-escape-oct` + * * :samp:`\\x{hh}` + * :ref:`string-escape-hex` + * * :samp:`\\N\\{{name}\\}` + * :ref:`string-escape-named` + * * :samp:`\\u{xxxx}` + * :ref:`Hexadecimal Unicode character ` + * * :samp:`\\U{xxxxxxxx}` + * :ref:`Hexadecimal Unicode character ` + +.. _string-escape-ignore: + +Ignored end of line +^^^^^^^^^^^^^^^^^^^ + A backslash can be added at the end of a line to ignore the newline:: >>> 'This string will not include \ @@ -760,9 +705,39 @@ Notes: The same result can be achieved using :ref:`triple-quoted strings `, or parentheses and :ref:`string literal concatenation `. +.. _string-escape-escaped-char: + +Escaped characters +^^^^^^^^^^^^^^^^^^ -(2) - As in Standard C, up to three octal digits (0 through 7) are accepted. + To include a backslash in a non-:ref:`raw ` Python string + literal, it must be doubled. The ``\\`` escape sequence denotes a single + backslash character:: + + >>> print('C:\\Program Files') + C:\Program Files + + Similarly, the ``\'`` and ``\"`` sequences denote the single and double + quote character, respectively:: + + >>> print('\' and \"') + ' and " + +.. _string-escape-oct: + +Octal character +^^^^^^^^^^^^^^^ + + The sequence :samp:`\\\\{ooo}` denotes a *character* with the octal (base 8) + value *ooo*:: + + >>> '\120' + 'P' + + Up to three octal digits (0 through 7) are accepted. + + In a bytes literal, *character* means a *byte* with the given value. + In a string literal, it means a Unicode character with the given value. .. versionchanged:: 3.11 Octal escapes with value larger than ``0o377`` (255) produce a @@ -770,42 +745,147 @@ Notes: .. versionchanged:: 3.12 Octal escapes with value larger than ``0o377`` (255) produce a - :exc:`SyntaxWarning`. In a future Python version they will be eventually - a :exc:`SyntaxError`. + :exc:`SyntaxWarning`. + In a future Python version they will raise a :exc:`SyntaxError`. + +.. _string-escape-hex: + +Hexadecimal character +^^^^^^^^^^^^^^^^^^^^^ + + The sequence :samp:`\\x{hh}` denotes a *character* with the hex (base 16) + value *hh*:: + + >>> '\x50' + 'P' + + Unlike in Standard C, exactly two hex digits are required. + + In a bytes literal, *character* means a *byte* with the given value. + In a string literal, it means a Unicode character with the given value. + +.. _string-escape-named: + +Named Unicode character +^^^^^^^^^^^^^^^^^^^^^^^ + + The sequence :samp:`\\N\\{{name}\\}` denotes a Unicode character + with the given *name*:: + + >>> '\N{LATIN CAPITAL LETTER P}' + 'P' + >>> '\N{SNAKE}' + '🐍' + + This sequence cannot appear in :ref:`bytes literals `. + + .. versionchanged:: 3.3 + Support for `name aliases `__ + has been added. -(3) - Unlike in Standard C, exactly two hex digits are required. +.. _string-escape-long-hex: -(4) - In a bytes literal, hexadecimal and octal escapes denote the byte with the - given value. In a string literal, these escapes denote a Unicode character - with the given value. +Hexadecimal Unicode characters +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -(5) - .. versionchanged:: 3.3 - Support for name aliases [#]_ has been added. + These sequences :samp:`\\u{xxxx}` and :samp:`\\U{xxxxxxxx}` denote the + Unicode character with the given hex (base 16) value. + Exactly four digits are required for ``\u``; exactly eight digits are + required for ``\U``. + The latter can encode any Unicode character. -(6) - Exactly four hex digits are required. + .. code-block:: python -(7) - Any Unicode character can be encoded this way. Exactly eight hex digits - are required. + >>> '\u1234' + 'ሴ' + >>> '\U0001f40d' + '🐍' + + These sequences cannot appear in :ref:`bytes literals `. .. index:: unrecognized escape sequence -Unlike Standard C, all unrecognized escape sequences are left in the string -unchanged, i.e., *the backslash is left in the result*. +Unrecognized escape sequences +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Unlike in Standard C, all unrecognized escape sequences are left in the string +unchanged, that is, *the backslash is left in the result*:: + + >>> print('\q') + \q + >>> list('\q') + ['\\', 'q'] + Note that for bytes literals, the escape sequences only recognized in string -literals fall into the category of unrecognized escapes. +literals (``\N...``, ``\u...``, ``\U...``) fall into the category of +unrecognized escapes. .. versionchanged:: 3.6 Unrecognized escape sequences produce a :exc:`DeprecationWarning`. .. versionchanged:: 3.12 - Unrecognized escape sequences produce a :exc:`SyntaxWarning`. In a future - Python version they will be eventually a :exc:`SyntaxError`. + Unrecognized escape sequences produce a :exc:`SyntaxWarning`. + In a future Python version they will raise a :exc:`SyntaxError`. + + +.. index:: + single: b'; bytes literal + single: b"; bytes literal + + +.. _bytes-literal: + +Bytes literals +-------------- + +:dfn:`Bytes literals` are always prefixed with ``'b'`` or ``'B'``; they produce an +instance of the :class:`bytes` type instead of the :class:`str` type. +They may only contain ASCII characters; bytes with a numeric value of 128 +or greater must be expressed with escape sequences. +Similarly, a zero byte must be expressed using an escape sequence. + + +.. index:: + single: r'; raw string literal + single: r"; raw string literal + +.. _raw-strings: + +Raw string literals +------------------- + +Both string and bytes literals may optionally be prefixed with a letter ``'r'`` +or ``'R'``; such constructs are called :dfn:`raw string literals` +and :dfn:`raw bytes literals` respectively and treat backslashes as +literal characters. +As a result, in raw string literals, :ref:`escape sequences ` +escapes are not treated specially. + +Even in a raw literal, quotes can be escaped with a backslash, but the +backslash remains in the result; for example, ``r"\""`` is a valid string +literal consisting of two characters: a backslash and a double quote; ``r"\"`` +is not a valid string literal (even a raw string cannot end in an odd number of +backslashes). Specifically, *a raw literal cannot end in a single backslash* +(since the backslash would escape the following quote character). Note also +that a single backslash followed by a newline is interpreted as those two +characters as part of the literal, *not* as a line continuation. + + +.. index:: physical line, escape sequence, Standard C, C + single: \ (backslash); escape sequence + single: \\; escape sequence + single: \a; escape sequence + single: \b; escape sequence + single: \f; escape sequence + single: \n; escape sequence + single: \r; escape sequence + single: \t; escape sequence + single: \v; escape sequence + single: \x; escape sequence + single: \N; escape sequence + single: \u; escape sequence + single: \U; escape sequence .. index:: @@ -815,6 +895,8 @@ literals fall into the category of unrecognized escapes. single: string; interpolated literal single: f-string single: fstring + single: f'; formatted string literal + single: f"; formatted string literal single: {} (curly brackets); in formatted string literal single: ! (exclamation); in formatted string literal single: : (colon); in formatted string literal @@ -1022,7 +1104,7 @@ actually an expression composed of the unary operator '``-``' and the literal .. _integers: Integer literals -^^^^^^^^^^^^^^^^ +---------------- Integer literals denote whole numbers. For example:: @@ -1095,7 +1177,7 @@ Formally, integer literals are described by the following lexical definitions: .. _floating: Floating-point literals -^^^^^^^^^^^^^^^^^^^^^^^ +----------------------- Floating-point (float) literals, such as ``3.14`` or ``1.5``, denote :ref:`approximations of real numbers `. @@ -1157,7 +1239,7 @@ lexical definitions: .. _imaginary: Imaginary literals -^^^^^^^^^^^^^^^^^^ +------------------ Python has :ref:`complex number ` objects, but no complex literals. @@ -1279,7 +1361,3 @@ occurrence outside string literals and comments is an unconditional error: $ ? ` - -.. rubric:: Footnotes - -.. [#] https://www.unicode.org/Public/16.0.0/ucd/NameAliases.txt From faf05a192ed7ec80ab26e803544ce9585b59d583 Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 25 Jun 2025 16:26:37 +0200 Subject: [PATCH 5/9] Byte strings, raw strings; f-string stub --- Doc/reference/lexical_analysis.rst | 65 +++++++++++++++++++++--------- 1 file changed, 46 insertions(+), 19 deletions(-) diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst index 36abfa31c093c9..2c6ae9a16d0d08 100644 --- a/Doc/reference/lexical_analysis.rst +++ b/Doc/reference/lexical_analysis.rst @@ -643,6 +643,21 @@ Note that as in all lexical definitions, whitespace is significant. In particular, the prefix (if any) must be immediately followed by the starting quote. +.. index:: physical line, escape sequence, Standard C, C + single: \ (backslash); escape sequence + single: \\; escape sequence + single: \a; escape sequence + single: \b; escape sequence + single: \f; escape sequence + single: \n; escape sequence + single: \r; escape sequence + single: \t; escape sequence + single: \v; escape sequence + single: \x; escape sequence + single: \N; escape sequence + single: \u; escape sequence + single: \U; escape sequence + .. _escape-sequences: Escape sequences @@ -842,8 +857,18 @@ Bytes literals :dfn:`Bytes literals` are always prefixed with ``'b'`` or ``'B'``; they produce an instance of the :class:`bytes` type instead of the :class:`str` type. They may only contain ASCII characters; bytes with a numeric value of 128 -or greater must be expressed with escape sequences. -Similarly, a zero byte must be expressed using an escape sequence. +or greater must be expressed with escape sequences (typically +:ref:`string-escape-hex` or :ref:`string-escape-oct`): + +.. code-block:: python + + >>> b'\x89PNG\r\n\x1a\n' + b'\x89PNG\r\n\x1a\n' + >>> list(b'\x89PNG\r\n\x1a\n') + [137, 80, 78, 71, 13, 10, 26, 10] + +Similarly, a zero byte must be expressed using an escape sequence (typically +``\0`` or ``\x00``). .. index:: @@ -860,7 +885,12 @@ or ``'R'``; such constructs are called :dfn:`raw string literals` and :dfn:`raw bytes literals` respectively and treat backslashes as literal characters. As a result, in raw string literals, :ref:`escape sequences ` -escapes are not treated specially. +are not treated specially: + +.. code-block:: python + + >>> r'\d{4}-\d{2}-\d{2}' + '\\d{4}-\\d{2}-\\d{2}' Even in a raw literal, quotes can be escaped with a backslash, but the backslash remains in the result; for example, ``r"\""`` is a valid string @@ -872,22 +902,6 @@ that a single backslash followed by a newline is interpreted as those two characters as part of the literal, *not* as a line continuation. -.. index:: physical line, escape sequence, Standard C, C - single: \ (backslash); escape sequence - single: \\; escape sequence - single: \a; escape sequence - single: \b; escape sequence - single: \f; escape sequence - single: \n; escape sequence - single: \r; escape sequence - single: \t; escape sequence - single: \v; escape sequence - single: \x; escape sequence - single: \N; escape sequence - single: \u; escape sequence - single: \U; escape sequence - - .. index:: single: formatted string literal single: interpolated string literal @@ -1067,6 +1081,19 @@ include expressions. See also :pep:`498` for the proposal that added formatted string literals, and :meth:`str.format`, which uses a related format string mechanism. +.. _t-strings: +.. _template-string-literals: + +t-strings +--------- + +A :dfn:`template string literal` or :dfn:`t-string` is a string literal that +is prefixed with ``'t'`` or ``'T'``. +These strings have internal structure similar to :ref:`f-strings`, +but are evaluated as Template objects instead of strings. + +.. versionadded:: 3.14 + .. _numbers: From 687fe5830318ca89a5541703bae3e62b3c8a7b5e Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 25 Jun 2025 16:38:09 +0200 Subject: [PATCH 6/9] Remove outdated comment --- Doc/reference/lexical_analysis.rst | 4 ---- 1 file changed, 4 deletions(-) diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst index 2c6ae9a16d0d08..e3d0bab8942ced 100644 --- a/Doc/reference/lexical_analysis.rst +++ b/Doc/reference/lexical_analysis.rst @@ -515,10 +515,6 @@ String and Bytes literals String literals are text enclosed in single quotes (``'``) or double quotes (``"``). For example: -.. This is Python code, but we turn off highlighting because as of this - writing, highlighted strings don't look good when there's no code - surrounding them. - .. code-block:: python "spam" From 9f9d29ccab8a5c25aa9433a90bd03d2a5521c36b Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 25 Jun 2025 16:50:18 +0200 Subject: [PATCH 7/9] Fix ReST errors --- Doc/reference/expressions.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst index 743d43b1c9c1b1..c1f046388c3d1b 100644 --- a/Doc/reference/expressions.rst +++ b/Doc/reference/expressions.rst @@ -160,7 +160,7 @@ value. .. _string-concatenation: String literal concatenation -............................ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same @@ -172,7 +172,7 @@ Formally: .. grammar-snippet:: :group: python-grammar - strings: ( `STRING` | `fstring` | `tstring`)+ + strings: ( `STRING` | fstring | tstring)+ Note that this feature is defined at the syntactical level, so it only works with literals. From 11e37317c24523f187630e137537caab218af2d4 Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 25 Jun 2025 16:51:31 +0200 Subject: [PATCH 8/9] Update Doc/reference/expressions.rst Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> --- Doc/reference/expressions.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst index c1f046388c3d1b..0803fadd0eeb37 100644 --- a/Doc/reference/expressions.rst +++ b/Doc/reference/expressions.rst @@ -143,7 +143,7 @@ integer, floating-point number, complex number) with the given value. The value may be approximated in the case of floating-point and imaginary (complex) literals. See section :ref:`literals` for details. -Seee section :ref:`string-concatenation` for details on ``strings``. +See section :ref:`string-concatenation` for details on ``strings``. .. index:: From 2cd9bf44b46d0e7857087f0c9760f832ab309050 Mon Sep 17 00:00:00 2001 From: Petr Viktorin Date: Wed, 2 Jul 2025 16:21:14 +0200 Subject: [PATCH 9/9] f-strings can only be concatenated with themselves --- Doc/reference/expressions.rst | 33 +++++++++++++++++++++++---------- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst index 0803fadd0eeb37..5bc83bc3d4313a 100644 --- a/Doc/reference/expressions.rst +++ b/Doc/reference/expressions.rst @@ -164,32 +164,45 @@ String literal concatenation Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same -as their concatenation. Thus, ``"hello" 'world'`` is equivalent to -``"helloworld"``. +as their concatenation:: + + >>> "hello" 'world' + "helloworld" Formally: .. grammar-snippet:: :group: python-grammar - strings: ( `STRING` | fstring | tstring)+ + strings: ( `STRING` | fstring)+ | tstring+ Note that this feature is defined at the syntactical level, so it only works with literals. To concatenate string expressions at run time, the '+' operator may be used:: - greeting = "Hello" - space = " " - name = "Blaise" - print(greeting + space + name) # not: print(greeting space name) + >>> greeting = "Hello" + >>> space = " " + >>> name = "Blaise" + >>> print(greeting + space + name) # not: print(greeting space name) + Hello Blaise Also note that literal concatenation can freely mix raw strings, -triple-quoted strings, and formatted or template string literals. -However, bytes literals may not be combined with string literals of any kind. +triple-quoted strings, and formatted string literals. For example:: + + >>> "Hello" r', ' f"{name}!" + "Hello, Blaise!" + +However, bytes literals may only be combined with other byte literals; +not with string literals of any kind. +Also, template string literals may only be combined with other template +string literals:: + + >>> t"Hello" t"{name}!" + Template(strings=('Hello', '!'), interpolations=(...)) This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add -comments to parts of strings, for example:: +comments to parts of strings. For example:: re.compile("[A-Za-z_]" # letter or underscore "[A-Za-z0-9_]*" # letter, digit or underscore pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy