Skip to content

BUG: Include python-including headers first #29281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

DWesl
Copy link
Contributor

@DWesl DWesl commented Jun 27, 2025

Ran into this trying to install scikit-image

@mattip
Copy link
Member

mattip commented Jun 27, 2025

Pleae add comments so we don’t undo this in a code refactor

@charris
Copy link
Member

charris commented Jun 27, 2025

Might just put Python.h up top, that way it will stay there after running clang-format.

@charris
Copy link
Member

charris commented Jun 27, 2025

Could also just run clang-format on the file after, it needs a cleanup.

@DWesl
Copy link
Contributor Author

DWesl commented Jun 27, 2025

I added an explicit Python.h include to those two files, with comments to explain why.

I then went a bit overboard and added those comments to every other Python.h include in the repository, moving it higher in the file if necessary.
EDIT: Those changes are in separate commits, to make it easier to pull those into separate PRs if needed.

@mattip
Copy link
Member

mattip commented Jun 29, 2025

I guess this all makes sense. What do others think?

mattip
mattip previously approved these changes Jun 29, 2025
Comment on lines 4 to 6
/* Any file that includes Python.h must include it before any other files */
/* https://docs.python.org/3/extending/extending.html#a-simple-example */
/* npy_common.h includes Python.h so it also counts in this list */
Copy link
Member

@ngoldbaum ngoldbaum Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to touch every single header, I'd prefer it if this comment could be formatted like other comments in the numpy repo

Suggested change
/* Any file that includes Python.h must include it before any other files */
/* https://docs.python.org/3/extending/extending.html#a-simple-example */
/* npy_common.h includes Python.h so it also counts in this list */
/*
Any file that includes Python.h must include it before any other files
https://docs.python.org/3/extending/extending.html#a-simple-example
*/

I also think the addendum about npy_common.h is only relevant for files that include it, and even then is maybe more confusing than helpful, since the most important thing is that Python.h comes first and the comment makes that clear.

Copy link
Member

@seberg seberg Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I might be happiest with just adding it as post-fix comment like #include <Python.h> /* Python.h include must be first. */ and leave it to the user to google if they care enough rather than this amount of comment everywhere.

(Or yeah, just don't add it and hope for a future style check instead.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think the addendum about npy_common.h is only relevant for files that include it,

The intent was that npy_common.h is a file that includes Python.h, and so either it or Python.h needs to be the first include in any file that includes it, but that's not relevant for this batch of comments and I can see how that wasn't clear.

@ngoldbaum
Copy link
Member

Just a drive-by thought: maybe we could do this using a lint check instead of adding identical comments all over the codebase?

Unfortunately I let #28634 sit, otherwise we'd already have a C linter you could add to....

@DWesl
Copy link
Contributor Author

DWesl commented Jul 1, 2025

Just a drive-by thought: maybe we could do this using a lint check instead of adding identical comments all over the codebase?

There's the Python script to check this I wrote for SciPy. Should I revert the last two comments, add that as a check, and adjust the files it flags? If so, should I move existing headers to the top of the files or add an explicit Python.h include at the top?

@mattip
Copy link
Member

mattip commented Jul 1, 2025

Should I revert the last two comments, add that as a check, and adjust the files it flags?

Yes, I think that would be better. We have a linter check on azure

should I move existing headers to the top of the files

I think that would be a less noisy change.

@mattip
Copy link
Member

mattip commented Jul 2, 2025

I am not sure this works. I reverted the change to numpy/_core/src/common/blas_utils.h and ran the script, it did not complain about the bad include order. Did I do something wrong?

@DWesl
Copy link
Contributor Author

DWesl commented Jul 2, 2025

I am not sure this works. I reverted the change to numpy/_core/src/common/blas_utils.h and ran the script, it did not complain about the bad include order. Did I do something wrong?

It looks like blas_utils.h doesn't include Python.h, so that's fine. Running the same test with blas_utils.c produces the same results, and that is a problem, since numpy/npy_math.h includes Python.h. It looks like the #include regex has problems with trailing comments. I will try to fix that.

@mattip mattip dismissed their stale review July 2, 2025 13:16

relates to an earlier version of the PR

@mattip
Copy link
Member

mattip commented Jul 2, 2025

CI is failing with

File "/home/vsts/work/1/s/tools/check_python_h_first.py", line 119, in check_python_h_included_first
    LEAF_HEADERS.append(this_header)
                        ^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'this_header' where it is not associated with a value

@charris
Copy link
Member

charris commented Jul 2, 2025

Note that if Python.h is present, clang-format will put it first.

submodule_paths = [os.path.join(root_directory, path) for path in
submodule_paths]
# vendored with a script rather than via gitmodules
submodule_paths.append(os.path.join(root_directory, 'scipy/_lib/pyprima'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor observation as I scan through my GitHub notifications--I recall suggesting/adding this line specifically for SciPy in some work with Lucas I think, so it can probably be removed here.

I'm a bit rusty on the NumPy vendoring situation--I suppose if NumPy vendors things without using git submodules there could be other relevant entries to add here?

Copy link
Member

@seberg seberg Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to, could try with linguist-vendored files from .gitattributes. Not the list is probably complete, but it is a list that could be expanded.

(Not saying I think it's a blocker here)

@DWesl
Copy link
Contributor Author

DWesl commented Jul 4, 2025

I wrote the initial version of check_python_h_first.py, but there's additions from other people in SciPy; the get_submodule_paths.py module is from someone else. Should I add comments to that effect to both files, or revert to my initial version from there and add my changes from here?

@seberg
Copy link
Member

seberg commented Jul 9, 2025

I dunno if it matters much, since I guess you mostly wrote it either way. It seems to me we should just get it in, or does anyone of the original reviewers want to have another look?

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments. The changes to the headers look much more minimal than the version of this PR I last looked at, so that's good.

I didn't try to understand the logic in the script. It seems pretty complicated? But if SciPy is using it, it's probably reasonably battle-tested, so I think I'm ok with including it.

not included_python
and not warned_python_construct
and ".h" not in basename_to_check
) and ("py::" in line or "PYBIND11_" in line):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy doesn't use pybind11 so this is unnecessary but probably harmless?

@@ -56,6 +56,9 @@ stages:
python tools/linter.py
displayName: 'Run Lint Checks'
failOnStderr: true
- script: |
python tools/check_python_h_first.py
displayName: 'Check Python.h is first file included'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you check whether introducing an intentional mistake causes this to fail as expected? I see the linter needs failOnStderr: ture - that's not needed here because the script exits with a nonzero status code if it fails, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran it and saw failures in places that made sense until I fixed them; that's how I found the changes beyond the two in the first commit. I probably won't write a test for that until I separate it out as its own package, which I should probably do soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this up into a proper package with tests over at https://github.com/DWesl/check-python-h-first. I did have to make changes to get all the tests to pass; the most relevant change was telling the script that a function starting with Py probably implies Python.h should have been included.

@DWesl
Copy link
Contributor Author

DWesl commented Jul 9, 2025

I didn't try to understand the logic in the script. It seems pretty complicated?

A short version of the core function: for each file in the list, it checks each line in that file for #include and variants (space before #, space between # and include, double quotes or angle brackets). If it finds #include:

  1. If the include is of a file known not to include any other files, continue to the next line
  2. If the include is not of Python.h, note it's including a non-Python header
  3. If the include is of Python.h, or a file known to include Python.h, mark this file as one known to include Python.h and return a list of the lines on which the file already included of non-python files earlier in the file
    If it does not see a #include, it looks for things that imply Python.h should have been included (mostly the py:: C++ namespace; the npy_ C namespace isn't as useful here, though the Py C namespace might be) and returns the line where that happens.

If it reaches the end of the file without including any files, that file is added to the list of files known not to include any other file.

The rest of it is basically a wrapper to exclude files in submodules or vendored projects from the list of files it should check, and to try to sort the list so headers appear before the files that include them, then to count the number of files that first include a non-Python.h file and exit with that status. I think someone over in SciPy added a fancy bit so you can tell it to check only the files changed since a specific commit, which I might have dropped here.

But if SciPy is using it, it's probably reasonably battle-tested, so I think I'm ok with including it.

I think I added it to SciPy a year or so back (after some searching, as scipy/scipy#20536), as a cheaper alternative to a Cygwin CI run to catch the most common failures: I think SciPy takes an hour to build, and likely longer to run the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy