-
-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Stack usage is awful #640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Very interesting. Is your patch available somewhere? |
Yep, pushed in 914bcf1 now |
So, why this happens? 3 big reasons:
Of these, 3 looks like low-hanging fruit for experimentation, and I submit this ticket essentially as grounds to tweak 1. Point 2 should be last in row to try, because it will affect performance. |
It appears to be similar for the STM port as well: Stack: 744 bytes or 416 bytes per level of function call. My patch was different (sinc stm doesn't have a mem_info, so I added pyb.stack())
|
This really should be micropython.stack() (to return int). mem_info() is dirty practical hack, we should avoid proliferating those (unix just already has it, until it's removed). |
I'm working on (initial steps of) point 1. |
While trying -fstack-usage and then looking into assembly, I saw completely nonsensical stack usage and allocation. Ah, gotta remember that Intel sells new CPUs by forcing completely ludicrous stack alignments to push more data out of caches and then putting bigger cache sizes to increase something in specs. Using -mpreferred-stack-boundary=2 (value is power of 2) cut stack size of one function from 80 to 60 bytes (even though default alignment should be 16). Then, one Py function recursion equals 292 bytes (with some changes to cut allocation already applied). |
Good point. I wasn't really sure exactly how/where we should add such a function. It seems quite useful for embedded work. I like to have a set of functions related to stack use: 1 - Have a function which can write the unused portion of the stack with some well known fixed value 3 - Have a function similar to the one coded which reports on the current stack usage. Under linux, doing this is a bit tricky, but still doable. The stack grows dynamically, getting new pages added as needed, so you need to probe to find out if pages have been allocated or not. Usually you can query /proc/self/maps and look for a line with [stack] to find the set of pages which are mapped to the stack, and use that as the starting point. You may also, for debugging purposes, just force the preallocation of a bunch of stack pages. Perhaps we should create a stack module? Or maybe we should have a mem module that includes the stack and the gc? Or maybe just keep them separate? Once embedded projects get more complicated, I can definitely see the need to walk the heap and get information about all of the allocated objects in the heap, both broad (how many objects are allocated for each class/type) and detailed (get information about each allocated object, especially the larger ones). |
Re more sophisticated stack profiling, see also #264. |
See c60a261 for some small savings of stack space. |
This reduces stack usage by 16 words (64 bytes) for stmhal/ port. See issue #640.
See aabd83e for moderate savings of stack. This patch made the following improvements:
So far we have gone from 480 to 272 on x86, and 416 to 224 on ARM Thumb. That's about a 45% decrease. |
Sweet. I saw the blurb about switching over to using the heap if more state is needed. What attributes of a function would cause this to happen? I'd like to understand the impact on interrupt handlers. |
A function that has a lot of local variables, and/or lots of arguments, and/or has a complicated expression, eg a+b*c(d, c+f). |
With a few fixes (described above) stack usage is no longer "awful", so the title of this issue is fixed. If someone considers stack usage to need further improvements, please provide a sensible metric to measure such improvements, and a goal, otherwise this issue will remain forever open. |
add board.RX and .TX pins to metro_m4_express_revb
I told long ago that I want to do basic stack usage info (nothing fancy, just by comparing addresses of stack-alloced vars). I didn't because I suspected that it will be such that the only outcome of that can be desire to drop everything and work on stack usage.
Anyway, I did that now (printed by mem_info() in unix port) and indeed it's huge:
So, one level of Python function recursion costs 480 bytes of C stack (on x86).
The text was updated successfully, but these errors were encountered: