-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[syntax-errors] PEP 701 f-strings before Python 3.12 #16543
base: main
Are you sure you want to change the base?
Conversation
|
I thought of some additional test cases tonight: Python 3.11.11 (main, Feb 12 2025, 14:51:05) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f"{"""x"""}"
File "<stdin>", line 1
f"{"""x"""}"
^
SyntaxError: f-string: expecting '}'
>>> f'{'''x'''}'
File "<stdin>", line 1
f'{'''x'''}'
^
SyntaxError: f-string: expecting '}'
>>> f"""{"x"}"""
'x'
>>> f'''{'x'}'''
'x' I'm pretty sure the code here handles these but it might be nice to add them as tests. I was especially concerned about the first two but checking for the outer |
); | ||
}; | ||
|
||
if let Some(comment_index) = self.source[expr.range].find('#') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will give you false positive when a nested valid string inside an f-string contains a #
character:
❯ uv run --python=3.9 --no-project python -q
>>> f"outer {'# not a comment'}"
'outer # not a comment'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, in the format specifier. The following is valid as per Python 3.8 parser, it's the format specifier which is invalid (unrelated to the parser) and format specifiers are part of the f-string expression node:
f'outer {x:{"# not a comment"} }'
); | ||
}; | ||
|
||
if let Some(quote_index) = self.source[expr.range].find(quote_str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gives you false positives for triple quoted strings:
f"{f'''{"nested"} inner'''} outer"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one actually is invalid:
Python 3.11.11 (main, Feb 12 2025, 14:51:05) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f"{f'''{"nested"} inner'''} outer"
File "<stdin>", line 1
f"{f'''{"nested"} inner'''} outer"
^^^^^^
SyntaxError: f-string: unterminated string
{ | ||
let quote_str = flags.quote_str(); | ||
for expr in elements.expressions() { | ||
if let Some(slash_index) = self.source[expr.range].find('\\') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also gives you false positives for
>>> f"test \
... more"
'test more'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah no, I think this should be fine because you only iterate over expressions. It might still be worth adding a test for it as well as for
f"test {a \
} more"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe take a look at https://github.com/astral-sh/ruff/blob/1ecb7ce6455a4a9a134fe8536625e89f74e3ec5b/crates/ruff_python_formatter/resources/test/fixtures/ruff/expression/fstring.py
and
they contain a good enumeration of the tricky cases.
It does make me slightly nervous that the current approach does a lot oparations on the source text directly instead of analyzing the tokens but accessing the tokens might require making this an analyzer (linter) check
Yeah, and f-strings are tricky because there's a lot more involved in here. Another approach here would be to use the tokens either in the parser or in the analyzer (which you've mentioned), more preference towards in the parser mainly because it already has the surrounding context i.e., are we in a nested f-string or are we in f-string expression? Maybe we could do this in the lexer itself and utilize |
Oof, thanks for the reviews. I had a feeling I over-simplified things, but these false positives look quite obvious in hindsight. I'll mark this as a draft for now and take a deeper look at this today. |
I still need to look for more tricky cases in the formatter fixtures, but I checked on the suggested escape and quote test cases, and I believe those are true positives (I also added them as tests). So the main issues here are around comments, which might be quite tricky (maybe this is why pyright doesn't flag them?) and around inspecting the source text directly. |
I think it would be helpful to do summarize the invalid patterns that we need to detect. It will help us decide:
Based on this we can decide on the approach but also the prioritisation of what the check should detect and we can even split it up into multiple PRs. |
That's a good idea, thanks. The three main cases I took away from the PEP were:
Escape sequences seem to be the easiest because as far as I can tell, CPython throws an error for any I think quotes are also easy because any nested
>>> f"""{f'''{f'{f"{1+1}"}'}'''}"""
'2' Comments are the hardest because you can't just check for Those are the three cases I attempted to fix here. I see now in PEP 498 that "Expressions cannot contain |
We discussed a possible approach in our 1:1. @ntBre let me know if that doesn't work and i can take another look |
I thought the f-string was guaranteed to be valid at this point, but I guess the parser can recover from that and still call this function.
@@ -14,7 +14,7 @@ pub(crate) struct TokenSource<'src> { | |||
//github.com/ A vector containing all the tokens emitted by the lexer. This is returned when the parser | |||
//github.com/ is finished consuming all the tokens. Note that unlike the emitted tokens, this vector | |||
//github.com/ holds both the trivia and non-trivia tokens. | |||
tokens: Vec<Token>, | |||
pub(crate) tokens: Vec<Token>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer an in_range
method or similar instead of making tokens
pub(crate)`
Thanks for the I tried applying a similar strategy to quotes, but I still think comparing the Similarly, I don't think I also looked into the |
Summary
This PR detects the use of PEP 701 f-strings before 3.12. This one sounded difficult and ended up being pretty easy, so I think there's a good chance I've over-simplified things. However, from experimenting in the Python REPL and checking with pyright, I think this is correct. pyright actually doesn't even flag the comment case, but Python does.
I also checked pyright's implementation for quotes and escapes and think I've approximated how they do it.
Python's error messages also point to the simple approach of these characters simply not being allowed:
And since escapes aren't allowed, I don't think there are any tricky cases where nested quotes or comments can sneak in.
It's also slightly annoying that the error is repeated for every nested quote character, but that also mirrors pyright, although they highlight the whole nested string, which is a little nicer. However, their check is in the analysis phase, so I don't think we have such easy access to the quoted range, at least without adding another mini visitor.
Test Plan
New inline tests