-
-
Notifications
You must be signed in to change notification settings - Fork 925
Fix memory leak in Concurrent/ThreadSafeLocalContextProvider, using a Thread for JRuby 9 #8483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Uses the Cleaner API to run LocalContext.remove() when a Thread has terminated, but also eagerly calls all LocalContext.remove() on terminate().
So... oops, I'm planning to reimplement it directly on top of the |
@matthias-fratz-bsz Do the PR against the We'll have to evaluate options for 9.4 since that must continue to support 8 for at least a year. |
... using the Cleaner API. Bonus points if you want to do a separate PR for 9.4 that just emulates Cleaner with threads and references! |
Cleaner is Java 9+, so it cannot be used on JRuby 9 because that is meant to run on Java 8 and up. The disadvantage wrt the Cleaner API is that this will leak a Thread if `terminate()` is never called. At least it's a Daemon Thread so it doesn't keep the JVM from exiting.
@headius I ported the Cleaner API version to 10-dev as PR #8561 (if "git cherry-pick" can be considered porting), and added a simple implementation here that just uses a cleanup thread per For simplicity, that replacement code isn't as sophisticated as |
@matthias-fratz-bsz Thanks for the extra work! I should have been more clear... JRuby 10 will depend on Java 21, so it can use the normal JDK I'll have a chat with @enebo and decide how we want to approach this. It might be simplest to merge what you have here for 9.4.x, merge that forward to 10, and then do another PR at some point to move 10's finalizer logic (ALL of it) to the JDK |
@headius The way I understood you, was to create a The 9.4.x solution using a separate thread (ie. commit 71260b8 here) is a bit of an afterthought. That obviously isn't needed for JRuby 10 (which is why it isn't in the other PR), but I made it anyway when I realized that it would be fairly easy. I'd feel better if that code doesn't stay forever though, because I'm not entirely sure that its cleanup thread is properly terminated in all situations (especially in eg. Tomcat). It's probably indeed easiest to merge this PR here into 9.4.x, merge it forward into 10-dev, and then |
Ok this is going to need more review and some discussion with @enebo @kares because it introduces another "cleaner" thread. To clarify for anyone catching up... this PR is a 9.4 version of #8561, but with a small hand-made |
@matthias-fratz-bsz How critical is this fix to you? We are hoping to get another quick release out (9.4.11.0) with some critical fixes, but I still have some concerns about the lifecycle of this new cleaning thread. I'd like to postpone until 9.4.12.0. I will look it over again today and see if I can assuage my concerns. |
@headius Go ahead; this is not a time-critical issue for us. The memory leak in Cantaloupe is slow enough that daily restarts, while ugly, are a serviceable workaround. Also, cantaloupe-project/cantaloupe#715 (essentially: turning off Variable Sharing) should reduce the amount of memory leaked to a point where it is essentially irrelevant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After review I feel pretty good about this impl. I summarize it here for myself and others who might review.
- LocalContext are set up on a per-thread basis, mirroring ThreadContext in the deeper JRuby runtime.
- The atomic reference holding the LocalContext is held in a ThreadLocal as before, but also in a PhantomReference.
- When the LocalContextProvider (Concurrent or ThreadSafe) is terminated, the cleaner cleans out all LocalContext references. This would be the normal situation.
- If terminate is not called, but the thread terminates and is successfully GCed, then its internal ThreadLocal table should go away, dereferencing the AtomicReference. Once GC clears that, the PhantomReference will be enqueued and a "Cleaner" daemon thread will eventually clear out remaining references to the LocalContext.
As for the extra thread... we do still spin up threads (in executors) for JIT, asynchronous IO select operations, and fibers. These executors get terminated when the JRuby runtime is terminated, which happens along the way when a ScriptingContainer is terminated. ScriptingContainer termination also calls terminate on the LocalContextProvider.
Possible concerns:
- ScriptingContainers that do not get terminated will not clean up the cleaner thread. Of course they will also not clean up the runtime, which will not clean up the executors, leaving potentially dozens of orphaned threads.
- There are few tests for this behavior to confirm that things are being cleaned up.
@matthias-fratz-bsz Perhaps you could write up a quick test of this embedding that, for example, spins up and terminates many threads and verifies they get cleaned up?
I would be ok merging this, since it's "just another thread" alongside the executors, it's specific to the embedding API, and users of that API must already be calling terminate
if they expect resources to get cleaned up.
@matthias-fratz-bsz Ok thanks for the feedback. I provided my review (with approval) so we will plan to merge this in for 9.4.12.0 (unless @enebo thinks it is critical enough to go in .11). |
I also analyzed the thread's lifecycle a bit more closely:
|
@headius I'll try to make such a test case and add it here. Current idea is to hold a |
@matthias-fratz-bsz that sounds like a good plan. |
4608d20
to
41cc56b
Compare
41cc56b
to
40ed6c1
Compare
@headius So... third time's the charm, I guess. This testcase now checks that:
The test for |
@matthias-fratz-bsz Thank you for the test! Looks like about as good as we can get it considering we're trying to test GC effects. We may need to do another "quick fix" 9.2.12.0 release that doesn't include this (trying to minimize changes again), but I think it's ready to merge into master after that. @enebo This could use some review but I believe it is ready. |
Not sure if this got rebased weirdly but I've restarted the CI run to see if the fails go away. If they don't we need to figure out what's wrong before merging. |
Uses the Cleaner API to run LocalContext.remove() when a Thread has terminated, but also eagerly calls all LocalContext.remove() on terminate().
This gets rid of the memory leak demonstrated by the test case in #8422 .