-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_multiprocessing leaks references when initialised multiple times #94382
Comments
It seems the leak is with |
Repeating #94336 (comment): AFAICS, the multiprocessing types do not access global module state, so there should be no reason to convert them to heap types. |
Where is that declared static? I don't see how it is considered a static integer. I am happy to be wrong though. |
I changed the issue title. It seems to me that the issue is a ref leak. The proposed solution is converting types to heap types. IMO, the GitHub Issue title should reflect the problem, not necessary the solution. |
Do you have a fix that does not use heap types or private APIs but fixes the issue? |
Ah, terminology. It's not C
I can send one later today, if nothing comes up. |
Sure, please add me as a reviewer, I'll learn something new. |
OK, I see now that the leak is not caused by SEM_VALUE_MAX, but (mostly?) by the method/getter/setter wrappers set by @vstinner, are you aware of this issue with |
In any case, the module state is unnecessary here. |
Does that mean that module state in Footnotes |
Yes, unless I missed something subtle, it is |
Note that some modules were too eagerly converted; though parts of the changes were unneeded, so one can argue that the conversion was churn for little or no benefit. On the other hand, reverting such a change will also be churn for little or no benefit. PEP-687 tries to avoid the unneeded churn. Though, one could argue that removing the state would reduce unneeded memory allocations. Such value might be worth the churn. |
The same change can be done in many modules, so I'd be much happier if it was discussed more broadly. For _json:
+1, removing unneeded module state from a single module sounds like a good change. |
Agreed. Let's move the discussion to Discourse. |
Petr:
Well, it may fall under the unspecified "other reason" part of this sentence; quoting from the PEP:
|
Many static types have been converted to heap types to fix https://bugs.python.org/issue1635741 which is a generalization of this "use case" of "not leaking memory at Python exit". |
There were many similar bug reports: |
I wrote a summary when this issue was closed: https://mail.python.org/archives/list/python-dev@python.org/thread/E4C6TDNVDPDNNP73HTGHN5W42LGAE22F/ |
It may, but since this reason applies to all types and it wasn't mentioned in the PEP, using it feels disingenuous. |
Fair enough. |
This issue (which started as focused for _multiprocessing module) and PEP 687 are only about extension modules so all types only means types in the c extension modules excluding the builtin types and types used internally by the interpreter. |
This is a very important issue that should be solved for users who uses embedding interpreter API. |
Right, builtin types get finalized in
Really? As far as I know, there is no ongoing memory leak if youinitialize the module many times, even in a loop with Py_Initialize/Py_Finalize. The extra memory gets allocated only once per process. (Is that right?) |
The problem is way more complicated than that. It's also about correctness. Between Python 3.8 and Python 3.11, I modified Python finalization to clear more and more Python objects at exit. For example, Python now clears the empty tuple, the empty string singleton, etc. Problem: if an object contains a reference to one of these singletons and this object survive between a Py_Finalize + Py_Initialize cycle, you get... two singletons: old from from the first Py_Initialize, a new one from the second Py_Initialize. But a singleton is supposed to be unique by design: "obj1 is singleton" is not longer true if there are two singletons! This bug happened in practice: see https://bugs.python.org/issue45691 This complex problem motivated me to add _PyStaticType_Dealloc(). See: https://bugs.python.org/issue46417#msg410808 |
In older Python versions, it wasn't an issue since singletons survived between Py_Finalize and Py_Initialize(). But as I wrote, it changed in latest Python versions. See Python/pylifecycle.c:
Well, the code changed multiple times. I made changes, then some got reverted. And for example, Eric Snow recently added Things changed multiple times, I failed to keep track of these issues. But well, there are complex issues ;-) |
Thanks @vstinner for the explanation. Since Python 3.11 more and more built-in singletons and cached objects are made static to avoid allocation during interpreter startup and that approach allows both subinterpreter support as the objects are static and immortal and provides performance improvement. Victor made some objects per-interpreter and Eric and I made them static. See #31616 As Victor explained, things are not as simple and extension module types should ideally be heap types so that each interpreter has its own copy and does not leak memory at exit. I would like to emphasize that these changes are only for extension modules and those which currently contains static types. All I can say is that static types should be avoided in c extension modules for which PEP 687 was written and approved but then here we are discussing the same thing again, heap types vs static types. |
Again, please re-read the PEP. It has a specific scope, and that scope does not immediately apply here, hence the discussion. |
Either we as a group come to an agreement that conversion to heap types is the solution here or I would like to see a counter proposal in form of PR which fixes this issue without using private APIs & heap types as I commented above #94382 (comment). Otherwise because of lack of consensus, this issue would have to be escalated to the steering council for decision/ advice. cc to PEP 687 discussion participants: @gvanrossum @ncoghlan |
I have no technical insight to offer (I don't know this part of the interpreter) but it looks to me like tempers are rising. I agree that the participants here ought to come to an agreement but threatening to involve the SC sound premature. |
I am not threatening to involve SC, but just saying that if there is no general agreement, we can ask the SC to provide advice/decision. Erlend's comment seems to indicate to me that this issue does not falls under what PEP 687 is trying to solve. To me this this issue seems like a perfect issue which can be solved under PEP 687 as the conversion to heap types seems the only solution. Also note as the PEP indicates much of the work is already done and most of static types of extensions modules are already ported to heap types. |
Don’t waste th SC’s time. |
For heap type conversion, PEP-687 explicitly applies to the following modules, as they contain types that access module state:
Maybe I missed a module. For other types of issues (that is: issues that is not «this module contains static types that access global module state»), PEP-687 may apply (ref. Petr and my comments earlier). In order to find out, a discussion on Discourse (and/or python-dev) is appropriate. Personally, I am fine with this change. Perhaps Petr also is fine with this. But our approvals are not relevant; it does not change the status quo: we must reach a general agreement that PEP-687 applies to these (this issue) types of problems as well. Remember a the core concept (IMO) of PEP-687 is communicating changes to the whole team. As mentioned earlier in this thread, an appropriate path forward would be to start a topic on Discourse and ask if it is ok that PEP-687 implicitly can apply to issues like this, as well as the issues explicitly mentioned in the PEP. If there is no general consensus we can ping the SC. Either by asking them for advice, or by submitting another PEP for such issues. |
Thanks Erlend for the list but it would have been better if the PEP had explicitly mentioned this list.
This is what I was trying to say if there is no general agreement, we can ask the SC for decision/advice. |
Perhaps there is a misunderstanding about what "general agreement" means. If this group (those involved on this issue) reach an agreement, that is not a "general agreement". A general agreement can only be reached amongst all core devs. Thus, the next level is creating a topic on Discourse, as has been mentioned multiple times already. |
@erlend-aasland: I'll create a discourse topic to discuss to this soon. Also, I apologies incase any of my wording sounded a bit rude but it was not my intension to hurt anyone here. |
Hi,
For complex issues, we have the PEP process. It's not just for getting agreement, but also for untangling/explaining the complexity. And while this change is small enough that a full PEP might not be needed, IMO it does need a clear explanation of what's wrong and why heap types are the best solution. And perhaps we'll not get there with just a discussion thread. (FWIW, I also fail to keep track of these issues, as you can see from the fact that this was not covered in the PEP.) |
Discourse topic created here: https://discuss.python.org/t/extension-modules-with-static-types-and-python-embedding-and-subinterpreters/16971 |
In the topic you still say this is a problem for embedders and multiple interpreters aka subinterpreters which in embedding case will leak memory whenever Python is initialized and finalized one or more times in the same process, while I still think the extra memory gets allocated only once per process. Am I wrong? Could you show a case where there is an unlimited memory leak? |
Note that memory leak is much less in Python 3.12 and 3.11 as the runtime finalization even clears static types. from test import support
import sys
for i in range(1, 10):
support.run_in_subinterp("import multiprocessing")
print("refs:", sys.gettotalrefcount(), "blocks: ", sys.getallocatedblocks()) Even if we consider the case for multiple finalization is less important still python leaks memory at exit even if it is initialized only once. Let's continue the discussion on discourse. |
Output of the script:
For me, it's a real memory leak. For the specific case of PR #94336, I don't think that we have to involve the SC or write a PEP. I reviewed the PR. IMO if you make requested changes, the PR can be merged. |
Requested changes have been made and also news entry added. |
Kumar, could you please update the PR title and NEWS entry to more accurately describe the change? Apart from that, I'm fine with this. Thanks for raising the discussion! 🙂 (Sorry, this comment was intended for the PR) |
"port multiprocessing static types to heap types" seems fine but I am open to suggestions. Feel free to edit it as you wish :)
Thanks! :) |
Fixed in |
Nice! One less. |
By converting
_multiprocessing
to heap types, the module can be used in subinterpreters and does not leak memory when Python finalizes or when it is initialized multiple times. The converted heap types are not performance critical.Current Memory leak at exit:
[170 refs, 71 blocks]
See PR #94336
The text was updated successfully, but these errors were encountered: