Replies: 5 comments 1 reply
-
Take the case Consider this function: x = 0
def cb(_):
global x, y
y = (x,2,3)
x += 1 This does allocate because each time the function is called a new tuple has to be created. I think this applies to a number of your non-allocating examples: the compiler creates constants where it can. |
Beta Was this translation helpful? Give feedback.
-
Yes, I agree that's what's going on there, and similarly with
|
Beta Was this translation helpful? Give feedback.
-
Hmm, I think the runtime figures might confirm this idea, having done a bit of experimenting. foo() defined as def foo():
def bar():
return 0
return bar() runs about 1000 cycles slower than def bar():
return 0
def foo():
return bar() But foo() defined as def foo():
def bar():
x = 0
x = x + 1
[... 1000s more increment lines ...]
x = x + 1
return x
return bar() is also only about 1000 cycles slower than def bar():
x = 0
x = x + 1
[... 1000s more increment lines ...]
x = x + 1
return x
def foo():
return bar() Thus it seems to be a constant-time difference not a difference proportional to code length, which implies some kind of environment/closure allocation, not a runtime code generation. That's a relief - python without nested functions would be pretty clunky. |
Beta Was this translation helpful? Give feedback.
-
The following allocates (fails with a memory error): from pyb import Timer
x = 0
def cb(_):
global x
def foo(y):
return y + 1
x = foo(x)
print(x)
cb(0) # Initial call runs OK
t = Timer(1, freq=1, callback=cb) # but fails first time it's called in a hard ISR context Much as I like nested functions and closures, I've never tried creating one in a hard ISR context. Very educational. Obviously the callback runs if I move One trick you may be unaware of: you can get a bytecode listing of your code with $ micropython -v -v my_code.py where Testing for allocation in a repeated hard ISR callback ensures that you can't be led astray by the compiler's useful habit of pre-compiling objects. |
Beta Was this translation helpful? Give feedback.
-
I guess one slightly circuitous way to get the closure scoping advantages of nested functions without allocation at run time is to break the body of the outer function into an extra inner function then return that. For example, to write def foo():
i = 1
def bar():
return i
def foo():
return bar()
return foo
foo = foo() instead of def foo():
i = 1
def bar():
return i
return bar() This does the closure allocation when I'd seen the verbose output from the micropython interpreter when I was playing with code emitters (earlier thread!) but hadn't thought to use it here - thanks. (Don't suppose you know of a way to coax it into dumping native assembler (or just machine code to disassemble by hand) when using the viper or native code emitters do you? I tried for a while when I first saw it, but drew a blank.) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
(With apologies to Blendtec for the plagiarism in my topic title!)
I've been writing some code which tries to avoid heap allocation at runtime, and realised that I should double-check my assumptions about what allocates and what does not. It's easy to check:
Here are a handful of things that don't allocate at runtime:
x, y = 1 + 2, 3 + 4
t = 3.4
print(x, y)
z = foo((4, 3))
(wheredef foo(x): return x
)a = (1, 2, 3)
for i in range(4): print(i)
and here are a handful of things that do allocate at runtime:
1.2 + 3.4
f'{1}'
z = bar(4, 3)
(wheredef bar(*x): return x
)a = [1, 2, 3]
x = range(4)
sum(i for i in range(4))
sum(i for i in 1, 2, 3)
At least a couple of those in each list surprised me.
Finally, another case that was interesting for me:
allocates, whereas
doesn't - and also runs at twice the speed, whether the functions are wrapped with @micropython.native or not. Apparently the former genuinely recreates bar() afresh every time foo() is run, so although nested functions seem handy for closure scoping, they are apparently a bad idea when performance or allocation-safety matters.
(Or is the nested function compiled/byte-compiled just once, and the measured allocation and performance difference just the creation of a lightweight closure structure to hold the captured variables?)
Does anyone have other surprising or interesting examples?
Beta Was this translation helpful? Give feedback.
All reactions