[syntax-errors] PEP 701 f-strings before Python 3.12 #16543

ntBre · 2025-03-06T20:36:02Z

Summary

This PR detects the use of PEP 701 f-strings before 3.12. This one sounded difficult and ended up being pretty easy, so I think there's a good chance I've over-simplified things. However, from experimenting in the Python REPL and checking with pyright, I think this is correct. pyright actually doesn't even flag the comment case, but Python does.

I also checked pyright's implementation for quotes and escapes and think I've approximated how they do it.

Python's error messages also point to the simple approach of these characters simply not being allowed:

Python 3.11.11 (main, Feb 12 2025, 14:51:05) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f'''multiline {
... expression # comment
... }'''
  File "<stdin>", line 3
    }'''
        ^
SyntaxError: f-string expression part cannot include '#'
>>> f'''{not a line \
... continuation}'''
  File "<stdin>", line 2
    continuation}'''
                    ^
SyntaxError: f-string expression part cannot include a backslash
>>> f'hello {'world'}'
  File "<stdin>", line 1
    f'hello {'world'}'
              ^^^^^
SyntaxError: f-string: expecting '}'

And since escapes aren't allowed, I don't think there are any tricky cases where nested quotes or comments can sneak in.

It's also slightly annoying that the error is repeated for every nested quote character, but that also mirrors pyright, although they highlight the whole nested string, which is a little nicer. However, their check is in the analysis phase, so I don't think we have such easy access to the quoted range, at least without adding another mini visitor.

Test Plan

New inline tests

github-actions · 2025-03-06T20:45:09Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

ntBre · 2025-03-07T03:00:33Z

I thought of some additional test cases tonight:

Python 3.11.11 (main, Feb 12 2025, 14:51:05) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f"{"""x"""}"
  File "<stdin>", line 1
    f"{"""x"""}"
          ^
SyntaxError: f-string: expecting '}'
>>> f'{'''x'''}'
  File "<stdin>", line 1
    f'{'''x'''}'
          ^
SyntaxError: f-string: expecting '}'
>>> f"""{"x"}"""
'x'
>>> f'''{'x'}'''
'x'

I'm pretty sure the code here handles these but it might be nice to add them as tests. I was especially concerned about the first two but checking for the outer quote_str should capture the right behavior.

dhruvmanila · 2025-03-07T03:17:32Z

crates/ruff_python_parser/src/parser/expression.rs

+                    );
+                };
+
+                if let Some(comment_index) = self.source[expr.range].find('#') {


I think this will give you false positive when a nested valid string inside an f-string contains a # character:

❯ uv run --python=3.9 --no-project python -q >>> f"outer {'# not a comment'}" 'outer # not a comment'

And, in the format specifier. The following is valid as per Python 3.8 parser, it's the format specifier which is invalid (unrelated to the parser) and format specifiers are part of the f-string expression node:

f'outer {x:{"# not a comment"} }'

MichaReiser · 2025-03-07T08:52:05Z

crates/ruff_python_parser/src/parser/expression.rs

+                    );
+                };
+
+                if let Some(quote_index) = self.source[expr.range].find(quote_str) {


This gives you false positives for triple quoted strings:

f"{f'''{"nested"} inner'''} outer"

I think this one actually is invalid:

Python 3.11.11 (main, Feb 12 2025, 14:51:05) [Clang 19.1.6 ] on linux Type "help", "copyright", "credits" or "license" for more information. >>> f"{f'''{"nested"} inner'''} outer" File "<stdin>", line 1 f"{f'''{"nested"} inner'''} outer" ^^^^^^ SyntaxError: f-string: unterminated string

MichaReiser · 2025-03-07T08:53:14Z

crates/ruff_python_parser/src/parser/expression.rs

+        {
+            let quote_str = flags.quote_str();
+            for expr in elements.expressions() {
+                if let Some(slash_index) = self.source[expr.range].find('\\') {


I think this also gives you false positives for

>>> f"test \ ... more" 'test more'

ah no, I think this should be fine because you only iterate over expressions. It might still be worth adding a test for it as well as for

f"test {a \ } more"

MichaReiser

Maybe take a look at https://github.com/astral-sh/ruff/blob/1ecb7ce6455a4a9a134fe8536625e89f74e3ec5b/crates/ruff_python_formatter/resources/test/fixtures/ruff/expression/fstring.py

and

https://github.com/astral-sh/ruff/blob/82faa9bb62e66a562f8a7ad81a645162ca558a08/crates/ruff_python_formatter/resources/test/fixtures/ruff/expression/fstring_preview.py

they contain a good enumeration of the tricky cases.

It does make me slightly nervous that the current approach does a lot oparations on the source text directly instead of analyzing the tokens but accessing the tokens might require making this an analyzer (linter) check

dhruvmanila · 2025-03-07T11:16:35Z

It does make me slightly nervous that the current approach does a lot oparations on the source text directly instead of analyzing the tokens but accessing the tokens might require making this an analyzer (linter) check

Yeah, and f-strings are tricky because there's a lot more involved in here.

Another approach here would be to use the tokens either in the parser or in the analyzer (which you've mentioned), more preference towards in the parser mainly because it already has the surrounding context i.e., are we in a nested f-string or are we in f-string expression?

Maybe we could do this in the lexer itself and utilize FStringErrorType to emit the errors which then the parser would convert into UnsupportedSyntaxError but I haven't explored this option. In the lexer, it would be easier to just check for Comment, Newline tokens when in a f-string expression mode and emit the errors. My main worry regarding the lexer would be any performance implications.

ntBre · 2025-03-07T15:01:19Z

Oof, thanks for the reviews. I had a feeling I over-simplified things, but these false positives look quite obvious in hindsight. I'll mark this as a draft for now and take a deeper look at this today.

ntBre · 2025-03-07T16:31:53Z

I still need to look for more tricky cases in the formatter fixtures, but I checked on the suggested escape and quote test cases, and I believe those are true positives (I also added them as tests). So the main issues here are around comments, which might be quite tricky (maybe this is why pyright doesn't flag them?) and around inspecting the source text directly.

MichaReiser · 2025-03-10T11:37:27Z

I think it would be helpful to do summarize the invalid patterns that we need to detect. It will help us decide:

How to best detect those (tokens, AST pass, parser, lexer, all of it?)
which patterns are easy/hard to detect

Based on this we can decide on the approach but also the prioritisation of what the check should detect and we can even split it up into multiple PRs.

ntBre · 2025-03-10T12:47:53Z

I think it would be helpful to do summarize the invalid patterns that we need to detect. It will help us decide:

How to best detect those (tokens, AST pass, parser, lexer, all of it?)

which patterns are easy/hard to detect

Based on this we can decide on the approach but also the prioritisation of what the check should detect and we can even split it up into multiple PRs.

That's a good idea, thanks. The three main cases I took away from the PEP were:

Nested quotes
Escape sequences
Comments

Escape sequences seem to be the easiest because as far as I can tell, CPython throws an error for any \ in an f-string expression part, whether it's part of an escape character (\n) or looks like a line-continuation character.

I think quotes are also easy because any nested quote_str (in our parlance) ends the string. That still feels oversimplified but I haven't seen any cases to the contrary. The PEP also includes this example:

In fact, this is the most nested-fstring that can be written:

>>> f"""{f'''{f'{f"{1+1}"}'}'''}"""
'2'

Comments are the hardest because you can't just check for # as Dhruv pointed out because that's a valid character inside of strings within the f-string.

Those are the three cases I attempted to fix here.

I see now in PEP 498 that "Expressions cannot contain ':' or '!' outside of strings or parentheses, brackets, or braces. The exception is that the '!=' operator is allowed as a special case." So that might be a fourth case we'd want to consider. At least initially it sounds roughly as complex as detecting comments.

MichaReiser · 2025-03-12T07:27:41Z

We discussed a possible approach in our 1:1. @ntBre let me know if that doesn't work and i can take another look

I thought the f-string was guaranteed to be valid at this point, but I guess the parser can recover from that and still call this function.

MichaReiser · 2025-03-14T07:43:20Z

crates/ruff_python_parser/src/token_source.rs

@@ -14,7 +14,7 @@ pub(crate) struct TokenSource<'src> {
    //github.com/ A vector containing all the tokens emitted by the lexer. This is returned when the parser
    //github.com/ is finished consuming all the tokens. Note that unlike the emitted tokens, this vector
    //github.com/ holds both the trivia and non-trivia tokens.
-    tokens: Vec<Token>,
+    pub(crate) tokens: Vec<Token>,


I think I'd prefer an in_range method or similar instead of making tokens pub(crate)`

ntBre · 2025-03-14T22:36:30Z

Thanks for the in_range suggestion! I factored out part of Tokens::in_range to reuse in the new TokenSource::in_range method, which made things much simpler.

I tried applying a similar strategy to quotes, but FStringStart, FStringEnd, and FStringMiddle all carry their own string flags, so it's not easy to differentiate between the inner and outer f-strings. Maybe I could bring back the stack from the previous implementation to track that, though.

I still think comparing the quote_str gets the correct answer because it includes triple quotes, but I'm still open to reworking that if you prefer. I could at least use memmem and memchr for the searches.

Similarly, I don't think \ is a token, so we pretty much have to do a text search for that, as far as I can tell.

I also looked into the : and ! mention from PEP 498 again, but I can't come up with anything that is valid syntax after 3.12 either. So I think it's okay not to check for those specially.

ntBre added 7 commits March 6, 2025 14:17

add variant and docs with examples

9b44a68

add test_ok

626ce7d

add failing test_err case

8336a66

check for invalid components, pass tests

2dea97c

add f-string kind variants and update messages

990d212

gate whole check behind is_unsupported

3881490

clippy

4eafe99

ntBre added parser Related to the parser preview Related to preview mode features labels Mar 6, 2025

ntBre requested review from MichaReiser and dhruvmanila as code owners March 6, 2025 20:36

ntBre mentioned this pull request Mar 6, 2025

Emit diagnostics for new syntax as per the target Python version #6591

Open

16 tasks

dhruvmanila reviewed Mar 7, 2025

View reviewed changes

MichaReiser reviewed Mar 7, 2025

View reviewed changes

ntBre marked this pull request as draft March 7, 2025 15:01

ntBre added 2 commits March 7, 2025 11:26

add true positive quote and continuation cases

532b3f9

Merge branch 'main' into brent/syn-f-strings

e8346d7

set target_version for black_compatibility tests

fc97d27

ntBre added 2 commits March 13, 2025 16:35

add failing false positives with comments

2bac998

pass string comment cases

f57ee3a

ntBre added 5 commits March 13, 2025 17:19

add cases with nesting, move comment check out of loop

890069c

Merge branch 'main' into brent/syn-f-strings

d488818

move check_f_string_comments to helpers module

d4e5498

avoid panic on invalid f-strings

6db743c

I thought the f-string was guaranteed to be valid at this point, but I guess the parser can recover from that and still call this function.

add a test case with escapes outside of the expression

eb3551d

MichaReiser reviewed Mar 14, 2025

View reviewed changes

ntBre added 7 commits March 14, 2025 16:43

factor out Tokens::in_range_impl and Tokens::after_impl

d8d3eb1

make TokenSource::tokens private and add TokenSource::in_range

a07b8fe

make check_fstring_comments a method and use in_range

79b88be

remove f-string stack now that range is restricted

84cc975

use then_some and document check_fstring_comments

d6ec8d6

factor out range

82222d5

Merge branch 'main' into brent/syn-f-strings

2ccd4e1

ntBre marked this pull request as ready for review March 14, 2025 22:37

ntBre marked this pull request as draft March 15, 2025 04:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[syntax-errors] PEP 701 f-strings before Python 3.12 #16543

[syntax-errors] PEP 701 f-strings before Python 3.12 #16543

ntBre commented Mar 6, 2025

github-actions bot commented Mar 6, 2025 •

edited

Loading

ntBre commented Mar 7, 2025 •

edited

Loading

dhruvmanila Mar 7, 2025

dhruvmanila Mar 7, 2025 •

edited

Loading

MichaReiser Mar 7, 2025

ntBre Mar 7, 2025

MichaReiser Mar 7, 2025

MichaReiser Mar 7, 2025

MichaReiser left a comment •

edited

Loading

dhruvmanila commented Mar 7, 2025

ntBre commented Mar 7, 2025

ntBre commented Mar 7, 2025

MichaReiser commented Mar 10, 2025

ntBre commented Mar 10, 2025

MichaReiser commented Mar 12, 2025

MichaReiser Mar 14, 2025

ntBre commented Mar 14, 2025

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

[syntax-errors] PEP 701 f-strings before Python 3.12 #16543

Are you sure you want to change the base?

[syntax-errors] PEP 701 f-strings before Python 3.12 #16543

Conversation

ntBre commented Mar 6, 2025

Summary

Test Plan

github-actions bot commented Mar 6, 2025 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

ntBre commented Mar 7, 2025 • edited Loading

dhruvmanila Mar 7, 2025

Choose a reason for hiding this comment

dhruvmanila Mar 7, 2025 • edited Loading

Choose a reason for hiding this comment

MichaReiser Mar 7, 2025

Choose a reason for hiding this comment

ntBre Mar 7, 2025

Choose a reason for hiding this comment

MichaReiser Mar 7, 2025

Choose a reason for hiding this comment

MichaReiser Mar 7, 2025

Choose a reason for hiding this comment

MichaReiser left a comment • edited Loading

Choose a reason for hiding this comment

dhruvmanila commented Mar 7, 2025

ntBre commented Mar 7, 2025

ntBre commented Mar 7, 2025

MichaReiser commented Mar 10, 2025

ntBre commented Mar 10, 2025

MichaReiser commented Mar 12, 2025

MichaReiser Mar 14, 2025

Choose a reason for hiding this comment

ntBre commented Mar 14, 2025

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

github-actions bot commented Mar 6, 2025 •

edited

Loading

`ruff-ecosystem` results

ntBre commented Mar 7, 2025 •

edited

Loading

dhruvmanila Mar 7, 2025 •

edited

Loading

MichaReiser left a comment •

edited

Loading