-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memoryview can set an exception in tp_clear #77894
Comments
The tp_clear handler of memoryview can set an exception when fail to release the buffer. An exception in tp_clear is not expected and caused a crash in the garbage collector. In the master branch it will cause just writing a traceback to stderr (see bpo-33622), but in any case it would be better to handle the failure locally in memoryview. I don't know what is the best solution: silencing an error, writing a traceback with more detailed information, or resurrecting the buffer object. |
Could you please show how tp_clear() can be called when self->exports > 0? It should not happen. bpo-33622 is a big issue with many commits. Would it be possible to extract the relevant part? |
I don't know how to reproduce a failure in tp_clear(). I just can't prove that it never fails. Maybe it is needed a bug in the implementation of the buffer protocol in third-party extension. If it should not happen then we can just add
or if (PyErr_Occurred()) {
PyErr_WriteUnraisable(NULL);
} It is better to crash in memoryview.c than in the garbage collector if this crash is caused by incorrect buffer protocol implementation. |
This looks the same as bpo-25525. I think it cannot happen, and no one has ever reported an actual issue for 6 years now. You *really* need to show a reproducer if you assert that something can crash. |
See the delete_garbage() function line 770 in Modules/gcmodule.c for changes in the master branch relevant to this issue. See Py_FatalError() in the collect() function at line 974 for a crash. |
Yes, but who calls tp_clear() if the memoryview is not being deallocated? |
The GC calls tp_clear() if the memoryview is a part of the reference loop. a = [memoryview(...)]
a.append(a)
del a The GC will call tp_clear() of the list or the memoryview. What will be called first is not specified. |
The point is that *no garbage collection is triggered* if self->exports > 0. It would be a major bug if it were and I suspect it would be reported within a week. Fortunately, no such bug has been reported in 6 years. |
Serhiy is right about the theoretical concern here. However, it's probably quite difficult to find a concrete situation where this occurs, because we're talking about mbuf_clear and the managerbuffer object can't really get involved in a reference cycle by itself (not in normal use anyway): the memoryview object does, but it's a different thing. By the way: PyBuffer_Release() returns void, so IMO it's a bug if it can return with an exception set. We should fix that rather than focus on mbuf_clear(). |
Well, the example would need exports: >>> a = [bytes()]
>>> a.append(memoryview(a[0]))
>>> a.append(memoryview(a[1]))
>>> a.append(a)
>>> a
[b'', <memory at 0x1ad21f8>, <memory at 0x1b52348>, [...]] The first memoryview has one export, so its refcount > 0. Do I fundamentally misunderstand tp_clear() and tp_clear() can be called on objects with refcount > 0? |
If the bug cannot occur, just add "assert(!PyErr_Occurred());" no? |
Well yes, I still want to understand tp_clear(). :) The docs are a bit vague. |
Yes, tp_clear can be called with refcount > 0. It's exactly why it's |
Okay that makes sense. :) I looked a bit at the gc code. A consumer object always has one So memory_clear() isn't called in that case, but mbuf_clear() is, which Indeed let's perhaps just add "if (self->exports > 0) return 0" to memory_clear() if those assumptions are too complex. |
Several posts in the thread mention this is a purely theoretical concern, but I keep running into a crash bug that could be caused by this. The error I keep getting:
When this happens, the interpreter hard crashes, but I can just rerun my integration test or unit test without changing anything and then it will work. So it's some kind of uncommon intermittent error which is crashing the interpreter. Unfortunately the error is intermittent and seems to happen very rarely. When it does happen, I can tell you that I am usually using the debugger to run code. I don't think it's happened to me when I'm not debugging. Also, the frequency of the error is greatly increased when using libraries which are actually wrappers to C or C++ code such as PyQT5. Most recently it happened to me during a subprocess.run() call to a binary executable on windows. I'm not sure if the underlying implementation of subprocess.run() has any non-python code that might be a source of errors as well. If I were to guess what the cause is, I imagine that there is an intermittent reference counting error in these foreign libraries the GC has no awareness of and this is somehow compounded by debugging because of additional variable references introduced by the debugger or something like that. If I find a reliable method to reproduce this error I will document it and try to produce a minimal example, but it seems to occur very randomly even with the preconditions of running through the debugger and interfacing with C/C++ libraries. |
Not sure if it helps but I'm getting 100% repro when starting the attached script using PyCharm debug. It never happens with debugger disconnected.
|
Have you solved this problem? I had the same problem |
…moryview Now a memoryview object can only be cleared if there are no buffers that refer it.
A reliable way to reproduce this issue using So now, when we have reproducers and can write tests, I created #123898 which implements the idea proposed above -- only release the memoryview object in |
…moryview Now a memoryview object can only be cleared if there are no buffers that refer it.
…ew (GH-123898) Now a memoryview object can only be cleared if there are no buffers that refer it.
…moryview (pythonGH-123898) Now a memoryview object can only be cleared if there are no buffers that refer it. (cherry picked from commit a1dbf2e) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…ng a memoryview (pythonGH-123898) Now a memoryview object can only be cleared if there are no buffers that refer it. (cherry picked from commit a1dbf2e) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@serhiy-storchaka If you are still looking for reproducible test cases, check out https://youtrack.jetbrains.com/issue/PY-54447. PyCharm users running under the debugger with code using |
@serhiy-storchaka I can confirm that the change seems to have fixed the issue. I ran the test code in the linked isue in Python 3.10 and reproduced the bug, then again in 3.12.7 with your fix and was not able to reproduce it. Thanks for chasing this down! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: