Will it allocate? #17701

arachsys · 2025-07-17T16:48:40Z

arachsys
Jul 17, 2025

(With apologies to Blendtec for the plagiarism in my topic title!)

I've been writing some code which tries to avoid heap allocation at runtime, and realised that I should double-check my assumptions about what allocates and what does not. It's easy to check:

def test():
  try:
    micropython.heap_lock()
    # Do something here
    print("succeeded")
  finally:
    micropython.heap_unlock()
test()

Here are a handful of things that don't allocate at runtime:

x, y = 1 + 2, 3 + 4
t = 3.4
print(x, y)
z = foo((4, 3)) (where def foo(x): return x)
a = (1, 2, 3)
for i in range(4): print(i)

and here are a handful of things that do allocate at runtime:

1.2 + 3.4
f'{1}'
z = bar(4, 3) (where def bar(*x): return x)
a = [1, 2, 3]
x = range(4)
sum(i for i in range(4))
sum(i for i in 1, 2, 3)

At least a couple of those in each list surprised me.

Finally, another case that was interesting for me:

def foo():
  def bar():
    return 0
  return bar()

allocates, whereas

def bar():
  return 0
def foo():
  return bar()

doesn't - and also runs at twice the speed, whether the functions are wrapped with @micropython.native or not. Apparently the former genuinely recreates bar() afresh every time foo() is run, so although nested functions seem handy for closure scoping, they are apparently a bad idea when performance or allocation-safety matters.

(Or is the nested function compiled/byte-compiled just once, and the measured allocation and performance difference just the creation of a lightweight closure structure to hold the captured variables?)

Does anyone have other surprising or interesting examples?

peterhinch · 2025-07-17T18:40:48Z

peterhinch
Jul 17, 2025
Collaborator

Take the case a = (1, 2, 3). I think what is going on is that the compiler generates the tuple (1, 2, 3) at compile time (because it can't vary).

Consider this function:

x = 0
def cb(_):
    global x, y
    y = (x,2,3)
    x += 1

This does allocate because each time the function is called a new tuple has to be created. I think this applies to a number of your non-allocating examples: the compiler creates constants where it can.

0 replies

arachsys · 2025-07-17T18:46:34Z

arachsys
Jul 17, 2025
Author

Yes, I agree that's what's going on there, and similarly with t = 3.4 - although floats are heap allocated, this is a define-time constant so it is heap allocated when the function is defined not when it is run.

x = 1; y = 2; a = x, y allocates, but interestingly x = 1; y = 2; a, b = x, y doesn't, so the 'pack and unpack' pattern is optimised to avoid a throwaway allocation. (It's still quite a bit slower to do a, b = x, y than a = x; b = y, though, even though it doesn't allocate on the heap.)

for i in range(n): print(i) also doesn't allocate even when n isn't constant, which is surprising but pleasing.

0 replies

arachsys · 2025-07-17T19:41:02Z

arachsys
Jul 17, 2025
Author

(Or is the nested function compiled/byte-compiled just once, and the measured allocation and performance difference just the creation of a lightweight closure structure to hold the captured variables?)

Hmm, I think the runtime figures might confirm this idea, having done a bit of experimenting.

foo() defined as

def foo():
  def bar():
    return 0
  return bar()

runs about 1000 cycles slower than

def bar():
  return 0
def foo():
  return bar()

But foo() defined as

def foo():
  def bar():
    x = 0
    x = x + 1
    [... 1000s more increment lines ...]
    x = x + 1
    return x
  return bar()

is also only about 1000 cycles slower than

def bar():
  x = 0
  x = x + 1
  [... 1000s more increment lines ...]
  x = x + 1
  return x
def foo():
  return bar()

Thus it seems to be a constant-time difference not a difference proportional to code length, which implies some kind of environment/closure allocation, not a runtime code generation. That's a relief - python without nested functions would be pretty clunky.

0 replies

peterhinch · 2025-07-18T11:46:26Z

peterhinch
Jul 18, 2025
Collaborator

The following allocates (fails with a memory error):

from pyb import Timer

x = 0
def cb(_):
    global x
    def foo(y):
        return y + 1
    x = foo(x)
    print(x)

cb(0)  # Initial call runs OK
t = Timer(1, freq=1, callback=cb)  # but fails first time it's called in a hard ISR context

Much as I like nested functions and closures, I've never tried creating one in a hard ISR context. Very educational. Obviously the callback runs if I move foo() into the global namespace.

One trick you may be unaware of: you can get a bytecode listing of your code with

$ micropython -v -v my_code.py

where micropython is the Unix build of MP.

Testing for allocation in a repeated hard ISR callback ensures that you can't be led astray by the compiler's useful habit of pre-compiling objects.

0 replies

arachsys · 2025-07-18T12:25:33Z

arachsys
Jul 18, 2025
Author

I guess one slightly circuitous way to get the closure scoping advantages of nested functions without allocation at run time is to break the body of the outer function into an extra inner function then return that. For example, to write

def foo():
  i = 1
  def bar():
    return i
  def foo():
    return bar()
  return foo
foo = foo()

instead of

def foo():
  i = 1
  def bar():
    return i
  return bar()

This does the closure allocation when foo = foo() is run just after the definition, and then when you subsequently run foo() in a hard callback or with the heap disabled, everything is fine. Think I'm unlikely to actually use this trick though! ;-)

I'd seen the verbose output from the micropython interpreter when I was playing with code emitters (earlier thread!) but hadn't thought to use it here - thanks. (Don't suppose you know of a way to coax it into dumping native assembler (or just machine code to disassemble by hand) when using the viper or native code emitters do you? I tried for a while when I first saw it, but drew a blank.)

1 reply

peterhinch Jul 20, 2025
Collaborator

Using a closure in an ISR could make sense where you want the ISR to retain state, but nested functions in an ISR hint at an over-complex ISR.

I don't know a way of dumping native code, alas.

MicroPython

Will it allocate? #17701

Uh oh!

Uh oh!

arachsys Jul 17, 2025

Replies: 5 comments · 1 reply

Uh oh!

peterhinch Jul 17, 2025 Collaborator

Uh oh!

Uh oh!

arachsys Jul 17, 2025 Author

Uh oh!

Uh oh!

arachsys Jul 17, 2025 Author

Uh oh!

peterhinch Jul 18, 2025 Collaborator

Uh oh!

arachsys Jul 18, 2025 Author

Uh oh!

peterhinch Jul 20, 2025 Collaborator

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

arachsys
Jul 17, 2025

Replies: 5 comments 1 reply

peterhinch
Jul 17, 2025
Collaborator

arachsys
Jul 17, 2025
Author

arachsys
Jul 17, 2025
Author

peterhinch
Jul 18, 2025
Collaborator

arachsys
Jul 18, 2025
Author

peterhinch Jul 20, 2025
Collaborator