-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
concurrent.futures.ProcessPoolExecutor can deadlock when tcmalloc is used #90622
Comments
When Python is built and linked with tcmalloc, using ProcessPoolExecutor may deadlock. Here is a reproducible example: $ cat t.py
from concurrent import futures
import sys
def work(iteration, item):
sys.stdout.write(f'working: iteration={iteration}, item={item}\n')
sys.stdout.flush()
for i in range(0, 10000):
with futures.ProcessPoolExecutor(max_workers=2) as executor:
executor.submit(work, i, 1)
executor.submit(work, i, 2)
$ python t.py
working: iteration=0, item=1
working: iteration=0, item=2
working: iteration=1, item=1
working: iteration=1, item=2
...
working: iteration=3631, item=1
working: iteration=3631, item=2
<hang here> The child process fails to finish. It's more likely to reproduce when the system is busy. With some bisect search internally, this commit 1ac6e37 "triggered" the deadlock threshold with tcmalloc. |
For context, the fundamental problem is that os.fork() is unsafe to use in any process with threads. concurrent/futures/process.py violates this. So long as its design involves a thread, it can never be guaranteed not to deadlock. POSIX forbids execution of most code after fork() has been called. Returning to the CPython interpreter after os.fork() in the child process is never safe unless the parent process had no threads at fork time. The change that triggered the issue moved from doing all of the os.fork() calls for the workers up front *before the thread was spawned* to doing them on demand after the thread was running. The bugfix for this is simply a rollback to revert the bpo-39207 change. It was a performance enhancement, but it causes a new deadlock in code that didn't already have one when thread tuned malloc implementations are used. The only way to safely launch worker processes on demand is to spawn a worker launcher process spawned prior to any thread creation that remains idle, with a sole job of spawn new worker processes for us. That sounds complicated. That'd be a feature. Lets go with the bugfix first. |
https://bugs.python.org/issue44733 for 3.11 attempts to build upon the dynamic spawning ability and will need reverting unless a non-thread+fork implementation is provided. |
fork is not the only way to launch worker process. We have spawn. And sapwn is the default for macOS since Python 3.8. Simple reverting seems not good for macOS users, since they need to pay cost for both of pre-spawning and spawn. |
It sounds like we need to introspect the mp_context= passed to ProcessPoolExecutor (and it's default when None) to raise an error when max_tasks_per_child is incompatible with it. |
and similarly, the dynamic spawning could be kept when the mp_context is spawn (and possibly forkserver). |
Indeed, it seems this should only be disabled when the "fork" model is used, especially as the optimization is mostly valuable when spawning a worker is expensive. |
…spawn via fork. This avoids potential deadlocks in the child processes due to forking from a multithreaded process.
…1587) Prevent `max_tasks_per_child` use with a "fork" mp_context to avoid deadlocks. Also defaults to "spawn" when no mp_context is supplied for safe convenience.
…ethod. (pythonGH-91598) Do not spawn ProcessPool workers on demand when they spawn via fork. This avoids potential deadlocks in the child processes due to forking from a multithreaded process. (cherry picked from commit ebb37fc) Co-authored-by: Gregory P. Smith <greg@krypto.org>
…#91598) Do not spawn ProcessPool workers on demand when they spawn via fork. This avoids potential deadlocks in the child processes due to forking from a multithreaded process.
… fork method. (pythonGH-91598) Do not spawn ProcessPool workers on demand when they spawn via fork. This avoids potential deadlocks in the child processes due to forking from a multithreaded process.. (cherry picked from commit ebb37fc) Co-authored-by: Gregory P. Smith <greg@krypto.org>
… fork method. (pythonGH-91598) (pythonGH-92497) Do not spawn ProcessPool workers on demand when they spawn via fork. This avoids potential deadlocks in the child processes due to forking from a multithreaded process.. (cherry picked from commit ebb37fc) Co-authored-by: Gregory P. Smith <greg@krypto.org> (cherry picked from commit b795376) Co-authored-by: Gregory P. Smith <greg@krypto.org>
…method. (GH-91598) (GH-92497) (#92499) Do not spawn ProcessPool workers on demand when they spawn via fork. This avoids potential deadlocks in the child processes due to forking from a multithreaded process.. (cherry picked from commit ebb37fc) Co-authored-by: Gregory P. Smith <greg@krypto.org> (cherry picked from commit b795376) Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Gregory P. Smith <greg@krypto.org>
Note that there is still one deadlock possibility that I left as a comment in the code. If the "fork" mp_context spawn method is used and one of the worker processes dies, a new one will be forked after multiprocessing has already started its executor manager thread. mp_context "forkserver" and "spawn" methods avoid that. |
… fork method. (pythonGH-91598) (pythonGH-92497) (python#92499) Do not spawn ProcessPool workers on demand when they spawn via fork. This avoids potential deadlocks in the child processes due to forking from a multithreaded process.. (cherry picked from commit ebb37fc) Co-authored-by: Gregory P. Smith <greg@krypto.org> (cherry picked from commit b795376) Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Gregory P. Smith <greg@krypto.org>
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: