Skip to content

Fix memory leak in Concurrent/ThreadSafeLocalContextProvider, using a Thread for JRuby 9 #8483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

matthias-fratz-bsz
Copy link

Uses the Cleaner API to run LocalContext.remove() when a Thread has terminated, but also eagerly calls all LocalContext.remove() on terminate().

This gets rid of the memory leak demonstrated by the test case in #8422 .

Uses the Cleaner API to run LocalContext.remove() when a Thread has
terminated, but also eagerly calls all LocalContext.remove() on
terminate().
@matthias-fratz-bsz
Copy link
Author

So... oops, Cleaner is Java 9+. BUILDING.md says JDK 8 or up, so my PR should probably be targeting that.

I'm planning to reimplement it directly on top of the Reference API some time after the christmas break. It'll probably take a while because the Cleaner API makes the boundary cases much easier to handle... but obviously that is of no help where it isn't available.

@headius
Copy link
Member

headius commented Dec 17, 2024

@matthias-fratz-bsz Do the PR against the 10-dev branch for now... that's our JRuby 10 branch and it will require Java 21 minimum.

We'll have to evaluate options for 9.4 since that must continue to support 8 for at least a year.

@headius
Copy link
Member

headius commented Dec 17, 2024

Do the PR against the 10-dev branch for now.

... using the Cleaner API.

Bonus points if you want to do a separate PR for 9.4 that just emulates Cleaner with threads and references!

Cleaner is Java 9+, so it cannot be used on JRuby 9 because that is meant
to run on Java 8 and up.

The disadvantage wrt the Cleaner API is that this will leak a Thread if
`terminate()` is never called. At least it's a Daemon Thread so it doesn't
keep the JVM from exiting.
@matthias-fratz-bsz
Copy link
Author

@headius I ported the Cleaner API version to 10-dev as PR #8561 (if "git cherry-pick" can be considered porting), and added a simple implementation here that just uses a cleanup thread per LocalContextProvider.

For simplicity, that replacement code isn't as sophisticated as Cleaner: If you forget to call terminate(), the thread sticks around forever. Cleaner does some really neat trickery to terminate its thread once the Cleaner instance itself has been garbage collected... but I don't think this is necessary here. It's a daemon thread that doesn't keep the JVM from exiting, and terminate() does always seem to be called eventually.

@matthias-fratz-bsz matthias-fratz-bsz changed the title Fix memory leak in Concurrent/ThreadSafeLocalContextProvider Fix memory leak in Concurrent/ThreadSafeLocalContextProvider, using a Thread for JRuby 9 Jan 8, 2025
@headius
Copy link
Member

headius commented Jan 8, 2025

@matthias-fratz-bsz Thanks for the extra work! I should have been more clear... JRuby 10 will depend on Java 21, so it can use the normal JDK Cleaner. But in order to fix the issue in 9.4, you have written your own version (which is also good).

I'll have a chat with @enebo and decide how we want to approach this. It might be simplest to merge what you have here for 9.4.x, merge that forward to 10, and then do another PR at some point to move 10's finalizer logic (ALL of it) to the JDK Cleaner.

@headius headius added this to the JRuby 9.4.10.0 milestone Jan 8, 2025
@matthias-fratz-bsz
Copy link
Author

@headius The way I understood you, was to create a Cleaner-based implementation for JRuby 10. There wasn't any extra work; my original implementation works just fine when rebased onto the 10-dev branch. That's in the other PR though, because I didn't find a way to change the branches on this one. Sorry if that caused confusion.

The 9.4.x solution using a separate thread (ie. commit 71260b8 here) is a bit of an afterthought. That obviously isn't needed for JRuby 10 (which is why it isn't in the other PR), but I made it anyway when I realized that it would be fairly easy. I'd feel better if that code doesn't stay forever though, because I'm not entirely sure that its cleanup thread is properly terminated in all situations (especially in eg. Tomcat).

It's probably indeed easiest to merge this PR here into 9.4.x, merge it forward into 10-dev, and then git revert 71260b8 on the 10-dev branch. The other PR is then pointless.

@headius headius changed the base branch from master to 10-dev January 15, 2025 18:55
@headius
Copy link
Member

headius commented Jan 15, 2025

Ok this is going to need more review and some discussion with @enebo @kares because it introduces another "cleaner" thread.

To clarify for anyone catching up... this PR is a 9.4 version of #8561, but with a small hand-made Cleaner since we can't use the Java 9+ version. This may be fine to merge, but introducing another thread demands some extra scrutiny.

@headius
Copy link
Member

headius commented Jan 28, 2025

@matthias-fratz-bsz How critical is this fix to you? We are hoping to get another quick release out (9.4.11.0) with some critical fixes, but I still have some concerns about the lifecycle of this new cleaning thread. I'd like to postpone until 9.4.12.0.

I will look it over again today and see if I can assuage my concerns.

@matthias-fratz-bsz
Copy link
Author

@headius Go ahead; this is not a time-critical issue for us. The memory leak in Cantaloupe is slow enough that daily restarts, while ugly, are a serviceable workaround. Also, cantaloupe-project/cantaloupe#715 (essentially: turning off Variable Sharing) should reduce the amount of memory leaked to a point where it is essentially irrelevant.

Copy link
Member

@headius headius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After review I feel pretty good about this impl. I summarize it here for myself and others who might review.

  • LocalContext are set up on a per-thread basis, mirroring ThreadContext in the deeper JRuby runtime.
  • The atomic reference holding the LocalContext is held in a ThreadLocal as before, but also in a PhantomReference.
  • When the LocalContextProvider (Concurrent or ThreadSafe) is terminated, the cleaner cleans out all LocalContext references. This would be the normal situation.
  • If terminate is not called, but the thread terminates and is successfully GCed, then its internal ThreadLocal table should go away, dereferencing the AtomicReference. Once GC clears that, the PhantomReference will be enqueued and a "Cleaner" daemon thread will eventually clear out remaining references to the LocalContext.

As for the extra thread... we do still spin up threads (in executors) for JIT, asynchronous IO select operations, and fibers. These executors get terminated when the JRuby runtime is terminated, which happens along the way when a ScriptingContainer is terminated. ScriptingContainer termination also calls terminate on the LocalContextProvider.

Possible concerns:

  • ScriptingContainers that do not get terminated will not clean up the cleaner thread. Of course they will also not clean up the runtime, which will not clean up the executors, leaving potentially dozens of orphaned threads.
  • There are few tests for this behavior to confirm that things are being cleaned up.

@matthias-fratz-bsz Perhaps you could write up a quick test of this embedding that, for example, spins up and terminates many threads and verifies they get cleaned up?

I would be ok merging this, since it's "just another thread" alongside the executors, it's specific to the embedding API, and users of that API must already be calling terminate if they expect resources to get cleaned up.

@headius
Copy link
Member

headius commented Jan 28, 2025

@matthias-fratz-bsz Ok thanks for the feedback. I provided my review (with approval) so we will plan to merge this in for 9.4.12.0 (unless @enebo thinks it is critical enough to go in .11).

@matthias-fratz-bsz
Copy link
Author

I also analyzed the thread's lifecycle a bit more closely:

  • It gets started essentially from the constructor. This isn't exactly best practice. Maybe it would be better to delay its start to, say, the first time a LocalContext is actually requested? Might be worth the slightly more complicated code.
  • It is only terminated from LocalContextProvider.termiante(). So if the LocalContextProvider were to be garbage-collected without calling termiante(), the thread would absolutely stick around forever (well, until JVM termination). However, LocalContextProvider.terminate() is called from ScriptingContainer.finalize() (via ScriptingContainer.terminate()), so there should be no situations where LocalContextProvider gets garbage-collected without having terminate() called first.

@matthias-fratz-bsz
Copy link
Author

@headius I'll try to make such a test case and add it here. Current idea is to hold a PhantomReference to the thread's LocalContext we want to monitor, then keep poking the GC until the reference gets enqueued proving that the LocalContext was cleaned. I haven't tried it but I think it'll work.

@headius
Copy link
Member

headius commented Jan 28, 2025

@matthias-fratz-bsz that sounds like a good plan.

@matthias-fratz-bsz
Copy link
Author

@headius So... third time's the charm, I guess. This testcase now checks that:

  • When the test thread quits, LocalContext.remove() is called and the LocalContext is garbage-collected.
  • When LocalContextProvider.terminate() is called, so is LocalContext.remove(). Once the test thread then terminates, its LocalContext is also garbage-collected.

The test for LocalContext being garbage-collected uses a PhantomReference as discussed. For checking that remove() was called, I found it easier to just override the method in a subclass. I'm checking both LocalContext.remove() and garbage-collection because LocalContext.remove() is presumably important to clean up resources, and garbage-collection avoids the memory leaks that got me here.

@headius
Copy link
Member

headius commented Feb 6, 2025

@matthias-fratz-bsz Thank you for the test! Looks like about as good as we can get it considering we're trying to test GC effects.

We may need to do another "quick fix" 9.2.12.0 release that doesn't include this (trying to minimize changes again), but I think it's ready to merge into master after that.

@enebo This could use some review but I believe it is ready.

@headius headius changed the base branch from 10-dev to master February 11, 2025 02:14
@headius
Copy link
Member

headius commented Feb 11, 2025

Not sure if this got rebased weirdly but I've restarted the CI run to see if the fails go away. If they don't we need to figure out what's wrong before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy