-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Use robust shebang code for all shebang modification #13390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
firstline = script.readline() | ||
if not firstline.startswith(b"#!python"): | ||
prelude = script.readline() | ||
if (m := re.match(rb"^#!python[^\s]*(\s.*)?$", prelude)) is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't conform to the spec, which only allows rewriting lines starting with precisely b'#!python
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you misread: that’s exactly what that regex matches.
The only reason I switched from startswith
to a regex is to capture the stuff behind the #!python
and pass it on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I did misread. But I'd still argue that the regex is harder to read. Why not simply split firstline
on whitespace?
It's worth noting that there are some edge cases in the spec - #!python3.12
should be rewritten, but it's not clear what "the correct interpreter" is if pip is running under Python 3.13 🙁
The current code doesn't get this right, either, but if we're going to claim the new code "is more robust" or "will no longer break", we should at least try to follow the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply split firstline on whitespace?
and strip the trailing newline from the second part. I find the regex to be more readable than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, we disagree.
(BTW, without documentation on _get_shebang
, I don't know whether stripping the newline is necessary).
return False | ||
exename = sys.executable.encode(sys.getfilesystemencoding()) | ||
firstline = b"#!" + exename + os.linesep.encode("ascii") | ||
prelude = ScriptMaker(None, None)._get_shebang("utf-8", m.group(1) or b"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, _get_shebang
does the wrong thing here. It doesn't correctly rewrite #!pythonw
as the GUI version of the interpreter, if pip is being run under the CLI version. Entry points use the ScriptMaker.make(..., {'gui': True})
mechanism to do that.
I'd suggest that we need a test here that ensures that #!pythonw
gets rewritten as Path(sys.executable).parent / (Path(sys.executable).stem + "w" + Path(sys.executable).suffix)
, as that's the correct behaviour according to the spec. The code as given would fail that test.
@@ -388,7 +406,12 @@ def test_wheel_record_lines_have_updated_hash_for_scripts( | |||
|
|||
script_path = script.bin_path / "dostuff" | |||
script_contents = script_path.read_bytes() | |||
assert not script_contents.startswith(b"#!python\n") | |||
expected_prefix = ( | |||
b"#!/bin/sh\n'''exec'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strong -1 on having the test rely on the precise mechanism used in the generated shebang..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how should I do it? without knowledge of the mechanism, checks like exe_path in script_contents
aren’t reliable either, because maybe the space is escaped with \
or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. It's a hard problem. Distlib has a bunch of tests around shebangs, so if we were using documented and supported distlib APIs, I'd say we could rely on them (and the existing check, that just confirms we did rewrite, would be enough). But because we're using the internal _get_shebang
function, we can't do that, as we have no way to ensure that we're using the function in a way that is covered by the distlib tests.
The fact that tests are failing on Windows demonstrates that this test isn't sufficient - the shebang you're checking for isn't what gets used on Windows (because Windows shells don't support it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments inline.
Fixes #13389