-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
ENH: avoid thread safety issues around uses of PySequence_Fast
#29394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to in general (might do one more quick pass to double check ref counts, but seemed all good).
One thing that isn't clear to me: I think it is OK to do a goto within a block?
So, I would have to double check the code, but it is not clear to me that you need anything of this with possible one careful change: Merge success/fail
into a single finish:
label, so that you can write:
res = NULL;
BEGIN_CRITICAL_SECTION;
if failure:
Py_CLEAR(res); // ensure res is NULL if needed
goto finish;
finish:; // may need the semicolon, not sure when exactly.
cleanup
END_CRITICAL_SECTION;
return res;
There is no return within the block, so all is good I think. I suspect you could do a goto fail
that does the Py_CLEAR()
, but not sure if there are subtleties and it is probably just more awkward.
Anyway, LGTM, some small comments you can ignore. But I think it would be good to explore that pattern, because it would be nice to diverge from CPython as little as possible and the single finish:
isn't bad at all.
Not sure why the debug python build crashed - will look at that closer. |
It turns out doing this without the semicolon is in C23:
Otherwise it looks like your suggestion to merge the error and success paths into a cleanup goto does work. Thanks! |
Spoke a little too soon. The |
71611cb
to
b67b95a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nitpick, otherwise looks good.
FWIW, I am tempted to just split the workspace macro into a definition now that we have a reason to do so, but happy to not do it here also, and having the no-bracket version isn't terrible (even if it seems to me that avoiding it isn't that bad in the end).
Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
I think I'll do that in a followup. |
See https://docs.python.org/3.14/howto/free-threading-extensions.html#general-api-guidelines for more details, but in short it's not safe to use this API without some kind of locking on the free-threaded build if it's possible for the accessed container to be modified while we're iterating over the contents.
To fix this, I added locking around all uses of the
PySequence_Fast
API. I also added a critical section around the object that is coerced to an ndarray inPyArray_FromAny_int
. Taken together these changes fix the possibility of thread safety issues from mutating the argument tonp.array()
,np.broadcast()
, andnp.nditer()
, as well as a few other more minor spots.This does not do anything to lock arbitrary nested array-likes, I only lock the outermost level of nesting. Doing more complicated things is tricky without causing deadlocks. We could also instead only lock whichever "leaf" list in a nested list of lists we are currently looking at, which avoids deadlocks, but that also leaves the outermost list unlocked. IMO the outermost list is most likely to be mutated, in practice.
To aid all that, I added several new macros to
npy_pycompat.h
, which wrap the public critical section API in the CPython C API: https://docs.python.org/3/c-api/init.html#python-critical-section-api.The versions in the C API include brackets and if I want to use those, I need to more substantially refactor the places I touched in this PR. I did use the versions with brackets in a few places, where that makes sense.
I also added macros specifically for applying critical section sto sequence, following the private macros for this purpose in CPython. These macros add an extra check that skips applying critical section for tuples. The private macros are defined in terms of things in the public C API, so I can just lift them out of CPython. If the private macros are ever made public in the future we can switch to those versions.
It'd be nice to get tests for
einsum
and__array_function__
, but it wasn't clear to me how to write a test that would trigger thread safety issues. The three tests I added all fail on currentmain
on the free-threaded build and pass on the GIL-enabled build.