Content-Length: 567986 | pFad | http://github.com/astral-sh/ruff/pull/16543

70 [syntax-errors] PEP 701 f-strings before Python 3.12 by ntBre · Pull Request #16543 · astral-sh/ruff · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[syntax-errors] PEP 701 f-strings before Python 3.12 #16543

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

ntBre
Copy link
Contributor

@ntBre ntBre commented Mar 6, 2025

Summary

This PR detects the use of PEP 701 f-strings before 3.12. This one sounded difficult and ended up being pretty easy, so I think there's a good chance I've over-simplified things. However, from experimenting in the Python REPL and checking with pyright, I think this is correct. pyright actually doesn't even flag the comment case, but Python does.

I also checked pyright's implementation for quotes and escapes and think I've approximated how they do it.

Python's error messages also point to the simple approach of these characters simply not being allowed:

Python 3.11.11 (main, Feb 12 2025, 14:51:05) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f'''multiline {
... expression # comment
... }'''
  File "<stdin>", line 3
    }'''
        ^
SyntaxError: f-string expression part cannot include '#'
>>> f'''{not a line \
... continuation}'''
  File "<stdin>", line 2
    continuation}'''
                    ^
SyntaxError: f-string expression part cannot include a backslash
>>> f'hello {'world'}'
  File "<stdin>", line 1
    f'hello {'world'}'
              ^^^^^
SyntaxError: f-string: expecting '}'

And since escapes aren't allowed, I don't think there are any tricky cases where nested quotes or comments can sneak in.

It's also slightly annoying that the error is repeated for every nested quote character, but that also mirrors pyright, although they highlight the whole nested string, which is a little nicer. However, their check is in the analysis phase, so I don't think we have such easy access to the quoted range, at least without adding another mini visitor.

Test Plan

New inline tests

@ntBre ntBre added parser Related to the parser preview Related to preview mode features labels Mar 6, 2025
Copy link
Contributor

github-actions bot commented Mar 6, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@ntBre
Copy link
Contributor Author

ntBre commented Mar 7, 2025

I thought of some additional test cases tonight:

Python 3.11.11 (main, Feb 12 2025, 14:51:05) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f"{"""x"""}"
  File "<stdin>", line 1
    f"{"""x"""}"
          ^
SyntaxError: f-string: expecting '}'
>>> f'{'''x'''}'
  File "<stdin>", line 1
    f'{'''x'''}'
          ^
SyntaxError: f-string: expecting '}'
>>> f"""{"x"}"""
'x'
>>> f'''{'x'}'''
'x'

I'm pretty sure the code here handles these but it might be nice to add them as tests. I was especially concerned about the first two but checking for the outer quote_str should capture the right behavior.

);
};

if let Some(comment_index) = self.source[expr.range].find('#') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will give you false positive when a nested valid string inside an f-string contains a # character:

uv run --python=3.9 --no-project python -q                
>>> f"outer {'# not a comment'}"
'outer # not a comment'

Copy link
Member

@dhruvmanila dhruvmanila Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, in the format specifier. The following is valid as per Python 3.8 parser, it's the format specifier which is invalid (unrelated to the parser) and format specifiers are part of the f-string expression node:

f'outer {x:{"# not a comment"} }'

);
};

if let Some(quote_index) = self.source[expr.range].find(quote_str) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives you false positives for triple quoted strings:

f"{f'''{"nested"} inner'''} outer"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one actually is invalid:

Python 3.11.11 (main, Feb 12 2025, 14:51:05) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f"{f'''{"nested"} inner'''} outer"
  File "<stdin>", line 1
    f"{f'''{"nested"} inner'''} outer"
             ^^^^^^
SyntaxError: f-string: unterminated string

{
let quote_str = flags.quote_str();
for expr in elements.expressions() {
if let Some(slash_index) = self.source[expr.range].find('\\') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also gives you false positives for

>>> f"test \
... more"
'test more'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah no, I think this should be fine because you only iterate over expressions. It might still be worth adding a test for it as well as for

f"test {a \
    } more"

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe take a look at https://github.com/astral-sh/ruff/blob/1ecb7ce6455a4a9a134fe8536625e89f74e3ec5b/crates/ruff_python_formatter/resources/test/fixtures/ruff/expression/fstring.py

and

https://github.com/astral-sh/ruff/blob/82faa9bb62e66a562f8a7ad81a645162ca558a08/crates/ruff_python_formatter/resources/test/fixtures/ruff/expression/fstring_preview.py

they contain a good enumeration of the tricky cases.

It does make me slightly nervous that the current approach does a lot oparations on the source text directly instead of analyzing the tokens but accessing the tokens might require making this an analyzer (linter) check

@dhruvmanila
Copy link
Member

It does make me slightly nervous that the current approach does a lot oparations on the source text directly instead of analyzing the tokens but accessing the tokens might require making this an analyzer (linter) check

Yeah, and f-strings are tricky because there's a lot more involved in here.

Another approach here would be to use the tokens either in the parser or in the analyzer (which you've mentioned), more preference towards in the parser mainly because it already has the surrounding context i.e., are we in a nested f-string or are we in f-string expression?

Maybe we could do this in the lexer itself and utilize FStringErrorType to emit the errors which then the parser would convert into UnsupportedSyntaxError but I haven't explored this option. In the lexer, it would be easier to just check for Comment, Newline tokens when in a f-string expression mode and emit the errors. My main worry regarding the lexer would be any performance implications.

@ntBre
Copy link
Contributor Author

ntBre commented Mar 7, 2025

Oof, thanks for the reviews. I had a feeling I over-simplified things, but these false positives look quite obvious in hindsight. I'll mark this as a draft for now and take a deeper look at this today.

@ntBre ntBre marked this pull request as draft March 7, 2025 15:01
@ntBre
Copy link
Contributor Author

ntBre commented Mar 7, 2025

I still need to look for more tricky cases in the formatter fixtures, but I checked on the suggested escape and quote test cases, and I believe those are true positives (I also added them as tests). So the main issues here are around comments, which might be quite tricky (maybe this is why pyright doesn't flag them?) and around inspecting the source text directly.

@MichaReiser
Copy link
Member

I think it would be helpful to do summarize the invalid patterns that we need to detect. It will help us decide:

  • How to best detect those (tokens, AST pass, parser, lexer, all of it?)
  • which patterns are easy/hard to detect

Based on this we can decide on the approach but also the prioritisation of what the check should detect and we can even split it up into multiple PRs.

@ntBre
Copy link
Contributor Author

ntBre commented Mar 10, 2025

I think it would be helpful to do summarize the invalid patterns that we need to detect. It will help us decide:

  • How to best detect those (tokens, AST pass, parser, lexer, all of it?)
  • which patterns are easy/hard to detect

Based on this we can decide on the approach but also the prioritisation of what the check should detect and we can even split it up into multiple PRs.

That's a good idea, thanks. The three main cases I took away from the PEP were:

  1. Nested quotes
  2. Escape sequences
  3. Comments

Escape sequences seem to be the easiest because as far as I can tell, CPython throws an error for any \ in an f-string expression part, whether it's part of an escape character (\n) or looks like a line-continuation character.

I think quotes are also easy because any nested quote_str (in our parlance) ends the string. That still feels oversimplified but I haven't seen any cases to the contrary. The PEP also includes this example:

In fact, this is the most nested-fstring that can be written:

>>> f"""{f'''{f'{f"{1+1}"}'}'''}"""
'2'

Comments are the hardest because you can't just check for # as Dhruv pointed out because that's a valid character inside of strings within the f-string.

Those are the three cases I attempted to fix here.

I see now in PEP 498 that "Expressions cannot contain ':' or '!' outside of strings or parentheses, brackets, or braces. The exception is that the '!=' operator is allowed as a special case." So that might be a fourth case we'd want to consider. At least initially it sounds roughly as complex as detecting comments.

@MichaReiser
Copy link
Member

We discussed a possible approach in our 1:1. @ntBre let me know if that doesn't work and i can take another look

@@ -14,7 +14,7 @@ pub(crate) struct TokenSource<'src> {
//github.com/ A vector containing all the tokens emitted by the lexer. This is returned when the parser
//github.com/ is finished consuming all the tokens. Note that unlike the emitted tokens, this vector
//github.com/ holds both the trivia and non-trivia tokens.
tokens: Vec<Token>,
pub(crate) tokens: Vec<Token>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer an in_range method or similar instead of making tokens pub(crate)`

@ntBre
Copy link
Contributor Author

ntBre commented Mar 14, 2025

Thanks for the in_range suggestion! I factored out part of Tokens::in_range to reuse in the new TokenSource::in_range method, which made things much simpler.

I tried applying a similar strategy to quotes, but FStringStart, FStringEnd, and FStringMiddle all carry their own string flags, so it's not easy to differentiate between the inner and outer f-strings. Maybe I could bring back the stack from the previous implementation to track that, though.

I still think comparing the quote_str gets the correct answer because it includes triple quotes, but I'm still open to reworking that if you prefer. I could at least use memmem and memchr for the searches.

Similarly, I don't think \ is a token, so we pretty much have to do a text search for that, as far as I can tell.

I also looked into the : and ! mention from PEP 498 again, but I can't come up with anything that is valid syntax after 3.12 either. So I think it's okay not to check for those specially.

@ntBre ntBre marked this pull request as ready for review March 14, 2025 22:37
@ntBre ntBre marked this pull request as draft March 15, 2025 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Related to the parser preview Related to preview mode features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/astral-sh/ruff/pull/16543

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy