Description
Environment Information
Provide at least:
- JRuby version range: at least 9.4.3.0 through commit 9d63c22
- Operating system: Linux 6.1.0-26-amd64 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64 GNU/Linux
Test Case
InstanceVariableMemoryLeak.java
This code creates many instances of a Ruby object, each of which contains several instance variables. No references to those Ruby objects are held anywhere in the example code. The code does a GC run to try and get rid of them, then drops a heap dump (because I like heap dumps).
Expected Behavior
Because no reference to the Ruby objects is held anywhere in user code, they are garbage-collected during that GC run. The heap dump is fairly boring as a result. Note that when the Sharing Variables feature is explicitly disabled, that's exactly what happens.
Actual Behavior
(with Variable Sharing enabled, which is the default)
The JRuby objects are never garbage-collected and neither are their instance variables. The heap dump shows that BiVariableMap
contains over 100k BiVariable
objects:
(According to the docs, it should be possible to disable Variable Sharing with System.setProperty("org.jruby.embed.sharing.variables", "false");
or -Dorg.jruby.embed.sharing.variables=false
... but that doesn't seem to work with the test case. container.setAttribute(AttributeName.SHARING_VARIABLES, false);
does work. That problem is somewhat unrelated, and maybe it's just a documentation bug and system properties aren't meant to become container attributes anyway, but I figured I should mention it.)
my analysis so far
All those Ruby instance variables are stored into a BiVariableMap
object for access from Java (observable with a breakpoint or print() in BiVariableMap.update()
). There is no mechanism for removing those once the Ruby object has been garbage-collected; at least I cannot find any in the code.
Each BiVariable
object has a receiver
member which references the Ruby object. Thus the Ruby object itself never becomes unreachable, and is thus also never garbage-collected.
The variable entries in BiVariableMap are created from EmbedEvalUnitImpl.run()
if Sharing Variables is enabled:
final BiVariableMap vars = container.getVarMap();
final boolean sharing_variables = isSharingVariables(container);
...
if (sharing_variables) {
vars.retrieve(ret);
}
They are actually created in InstanceVariable.updateInstanceVar()
, called via InstanceVariable.retrieve()
, VariableInterceptor.retrieve()
, BiVariableMap.retrieve()
from EmbedEvalUnitImpl.run()
(and others). InstanceVariable.updateInstanceVar()
first tries to update an existing BiVariable
in BiVariableMap
, but then creates a new InstanceVariable
object when that variable doesn't yet exist.
I strongly suspect that the receiver
reference in BiVariable
could be removed. Looking at BiVariableMap
, the only way to get hold of a BiVariable
with receiver != getTopSelf()
is:
- calling
getVariable(RubyObject, String)
and passing a reference to that Ruby object in the first place - calling
getVariables()
(unfortunately)
All the Map
interfaces actually ignore the receiver
object, sometimes in inconsistent ways. For example, if there is an instance variable called foo
, containsKey("foo") == true
but get("foo") == null
because get
checks the receiver object where containsKey
doesn't.
That is, it should be possible to store the values of instance variables in the RubyObject
instead, where they are garbage-collected along with that Ruby object. Or lazily create the InstanceVariable
objects in BiVariableMap
only when they are actually set, and the caller can be expected to remove them after use.
The big issue with this approach that I cannot figure out, is how to implement the getVariables()
method. It is public, so there is no way of knowing whether any code out there relies on it to access instance variables. There are more specific accessors (get(Object, Object)
, getVariable(RubyObject, String)
, put(Object, String, Object)
etc) so there is no need to use getVariables()
for that... but existing code might use it anyway :(