Skip to content

gh-136541: Fix several problems of perf trampolines in x86_64 and aarch64 #136500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 11, 2025

Conversation

pablogsal
Copy link
Member

@pablogsal pablogsal commented Jul 10, 2025

Fix the following problems:

  • The x86_64 trampolines are not preserving frame pointers
  • The hardcoded offsets to the code segment from the FDE only worked properly for x64_64
  • The CIE data was not following conventions of aarch64

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @pablogsal for commit 8b228c0 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136500%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 10, 2025
@pablogsal
Copy link
Member Author

268c9f00-cc0a-11e9-8f64-5467a1863ebc

@diegorusso @brandtbucher @savannahostrowski

@pablogsal
Copy link
Member Author

!buildbot perf

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @pablogsal for commit b174fe8 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136500%2Fmerge

The command will test the builders whose names match following regular expression: perf

The builders matched are:

  • AMD64 Arch Linux Perf PR

Copy link

@canova canova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Do you think it's possible to remove this skipUnless now?

@unittest.skipUnless(
is_unwinding_reliable_with_frame_pointers(),
"Unwinding is unreliable with frame pointers",
)

(also I think there is a typo in that string. Probably was meant to say "Unwinding is unreliable without frame pointers".)

@pablogsal
Copy link
Member Author

pablogsal commented Jul 10, 2025

I don't think we can. I am pretty sure Perf will still choke without frame pointers in the outer functions and fail the test.

The sentence is indeed confusing but is trying to say that unwinding using frame pointers is unreliable as "not working"

@pablogsal
Copy link
Member Author

@canova I am AFK do you mind checking in case we are lucky?

@canova
Copy link

canova commented Jul 10, 2025

Ah that's right. If we add a test with samply, then we might be able to run that test with and without frame pointers then.

@canova
Copy link

canova commented Jul 10, 2025

@canova I am AFK do you mind checking in case we are lucky?

Just did, and unfortunately, we are not lucky :') still fails.

@canova
Copy link

canova commented Jul 10, 2025

Added the same comment to the other PR, but adding here for posterity:

Tested this PR using samply. I can verify that it fixes the stack walking! Here are before and after profiles:
Before your patch / After your patch

@pablogsal
Copy link
Member Author

!buildbot Perf

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @pablogsal for commit b174fe8 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136500%2Fmerge

The command will test the builders whose names match following regular expression: Perf

The builders matched are:

  • AMD64 Arch Linux Perf PR

@mstange
Copy link

mstange commented Jul 10, 2025

Thank you for making this change!

If the Python C code is compiled with framepointers, I believe this change will also improve unwinding when you use Linux perf with framepointer unwinding (perf record -g); it should now give complete stacks, whereas in the past, I think the immediate caller of the trampoline (py_trampoline_evaluator) would have been missing from the stack.
If you use perf with DWARF unwinding, what does it do for JIT code without unwind info? Do you know if it falls back to using framepointers, the way samply does? If that were the case, I think you'd be able to simplify things drastically because you'd no longer have to emit unwind records to the jitdump file.

Oh, and one other thing I want to mention: samply supports both macOS and Linux, and this patch helps with both, so it's not strictly related to just #136459.

@canova
Copy link

canova commented Jul 10, 2025

Oh, and one other thing I want to mention: samply supports both macOS and Linux, and this patch helps with both, so it's not strictly related to just #136459.

The profiles that I captured were actually from Linux x86_64. I should have mentioned that too :)

@pablogsal
Copy link
Member Author

If the Python C code is compiled with framepointers, I believe this change will also improve unwinding when you use Linux perf with framepointer unwinding (perf record -g); it should now give complete stacks, whereas in the past, I think the immediate caller of the trampoline (py_trampoline_evaluator) would have been missing from the stack.

It does not unfortunately. See #136500 (comment) and #136500 (review)

@pablogsal
Copy link
Member Author

If you use perf with DWARF unwinding, what does it do for JIT code without unwind info? Do you know if it falls back to using framepointers, the way samply does? If that were the case, I think you'd be able to simplify things drastically because you'd no longer have to emit unwind records to the jitdump file.

it chokes on it and stops at that frame (like any other unwinder on the landscape) :S We are having that problem here for the big JIT and is a real pain: #126910

@pablogsal pablogsal changed the title gh-136459: Use frame pointers in the x86_64 perf trampolines gh-136459: Fix several problems of perf trampolines in x86_64 and aarch64 Jul 11, 2025
@pablogsal pablogsal changed the title gh-136459: Fix several problems of perf trampolines in x86_64 and aarch64 gh-136541: Fix several problems of perf trampolines in x86_64 and aarch64 Jul 11, 2025
@pablogsal pablogsal added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Jul 11, 2025
@python python deleted a comment from bedevere-bot Jul 11, 2025
@pablogsal pablogsal added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 11, 2025
@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 11, 2025
@python python deleted a comment from bedevere-bot Jul 11, 2025
@python python deleted a comment from bedevere-bot Jul 11, 2025
@pablogsal
Copy link
Member Author

!buildbot Fedora Stable

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @pablogsal for commit 8a81457 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136500%2Fmerge

The command will test the builders whose names match following regular expression: Fedora Stable

The builders matched are:

  • s390x Fedora Stable Clang PR
  • PPC64LE Fedora Stable Clang PR
  • aarch64 Fedora Stable LTO PR
  • PPC64LE Fedora Stable LTO + PGO PR
  • PPC64LE Fedora Stable PR
  • PPC64LE Fedora Stable LTO PR
  • aarch64 Fedora Stable Clang PR
  • AMD64 Fedora Stable LTO + PGO PR
  • aarch64 Fedora Stable PR
  • PPC64LE Fedora Stable Refleaks PR
  • s390x Fedora Stable PR
  • AMD64 Fedora Stable LTO PR
  • AMD64 Fedora Stable Clang PR
  • aarch64 Fedora Stable Clang Installed PR
  • aarch64 Fedora Stable LTO + PGO PR
  • AMD64 Fedora Stable PR
  • PPC64LE Fedora Stable Clang Installed PR
  • s390x Fedora Stable Clang Installed PR
  • AMD64 Fedora Stable Refleaks PR
  • AMD64 Fedora Stable Clang Installed PR
  • aarch64 Fedora Stable Refleaks PR
  • s390x Fedora Stable LTO PR
  • s390x Fedora Stable LTO + PGO PR
  • s390x Fedora Stable Refleaks PR

@pablogsal pablogsal merged commit 236f733 into python:main Jul 11, 2025
68 of 74 checks passed
@pablogsal pablogsal deleted the gh-136459 branch July 11, 2025 13:32
@miss-islington-app
Copy link

Thanks @pablogsal for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 11, 2025
…nd aarch64 (pythonGH-136500)

This commit fixes the following problems:

* The x86_64 trampolines are not preserving frame pointers
* The hardcoded offsets to the code segment from the FDE only worked properly for x64_64
* The CIE data was not following conventions of aarch64
* The eh_frame for aarch64 was not fully correct
(cherry picked from commit 236f733)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
@miss-islington-app
Copy link

Sorry, @pablogsal, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 236f733d8ffb3d587e1167fa0a0248c24512e7fd 3.13

@bedevere-app
Copy link

bedevere-app bot commented Jul 11, 2025

GH-136544 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Jul 11, 2025
pablogsal added a commit to pablogsal/cpython that referenced this pull request Jul 11, 2025
…86_64 and aarch64 (pythonGH-136500)

This commit fixes the following problems:

* The x86_64 trampolines are not preserving frame pointers
* The hardcoded offsets to the code segment from the FDE only worked properly for x64_64
* The CIE data was not following conventions of aarch64
* The eh_frame for aarch64 was not fully correct
(cherry picked from commit 236f733)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Jul 11, 2025

GH-136545 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Jul 11, 2025
pablogsal added a commit that referenced this pull request Jul 11, 2025
…and aarch64 (GH-136500) (#136545)

This commit fixes the following problems:

* The x86_64 trampolines are not preserving frame pointers
* The hardcoded offsets to the code segment from the FDE only worked properly for x64_64
* The CIE data was not following conventions of aarch64
* The eh_frame for aarch64 was not fully correct
(cherry picked from commit 236f733)
pablogsal added a commit that referenced this pull request Jul 11, 2025
…and aarch64 (GH-136500) (#136544)

gh-136541: Fix several problems of perf trampolines in x86_64 and aarch64 (GH-136500)

This commit fixes the following problems:

* The x86_64 trampolines are not preserving frame pointers
* The hardcoded offsets to the code segment from the FDE only worked properly for x64_64
* The CIE data was not following conventions of aarch64
* The eh_frame for aarch64 was not fully correct
(cherry picked from commit 236f733)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
Pranjal095 pushed a commit to Pranjal095/cpython that referenced this pull request Jul 12, 2025
…nd aarch64 (python#136500)

This commit fixes the following problems:

* The x86_64 trampolines are not preserving frame pointers
* The hardcoded offsets to the code segment from the FDE only worked properly for x64_64
* The CIE data was not following conventions of aarch64
* The eh_frame for aarch64 was not fully correct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy