Description
First of all, thank you so much for developing such a powerful and interesting project.
We are a group working on libraries such as LIBSVM, LIBLINEAR, LibMultiLabel, and so on.
Recently, we have integrating python-graphblas into our codebase to speed up sparse matrix operation.
However, during the integrations, we encountered a memory issue when using sparse matrix multiplication.
We would like to share our observation and findings, and hope to get some feedback on potential improvements.
The information about our sparse matrices and predefined variables is shown below:
weights shape: (135909, 826017) # CSR and NNZ: 1,818,756,815
instances shape: (153025, 135909) # CSR and NNZ: 14,013,838
batch_size = 256
Our usage is simplified in the following examples:
def predict_values(A, B):
C = gb.Matrix(float, A.shape[0], B.shape[1])
C << A.mxm(B, op=gb.semiring.min_plus)
return C.to_dense(fill_value=0)
def main():
for idx in range(0, instances.shape[0], batch_size):
batch_x = instances[idx : idx + batch_size, :]
results = predict_values(batch_x, weights)
We initially assumed that once predict_values
returns and the local variable C
goes out of the predict_values
function scope, its memory would be released (since the reference to C
no longer exists).
However, based on our observations, the memory occupied by C
is not released.
Instead, the memory usage grows linearly with each function call, until some conditions are met (i.e., the threshold of Python's garbage collector) and Python eventually release it.
This issue becomes more severe when C
consumes a large amount of memory.
If C
consumes a large amount of memory, this delayed release causes the system to consume all available memory (including swap), which in turn degrades the performance of matrix multiplication.
Upon investigation, we found that the reason of the memory issue comes from the circular reference in GraphBLAS.
class Matrix(BaseType):
# SKIP
def __new__(cls, dtype=FP64, nrows=0, ncols=0, *, name=None):
# SKIP
if backend == "suitesparse":
self.ss = ss(self)
copied from graphblas/core/matrix.py
class ss:
__slots__ = "_parent", "config"
def __init__(self, parent):
self._parent = parent
self.config = MatrixConfig(parent)
copied from graphblas/core/ss/matrix.py.
The simplified reference structure is as follows:
Matrix M
└── .ss (attribute of M)
└── (references back to) M
This creates a circular reference between the Matrix
object and its .ss
attribute.
Since Python uses reference counting to manage memory, the reference count of objects involved in a circular reference never becomes zero, even when local variables go out of scope (e.g., our simplified example).
As a result, these objects remain in memory until Python's garbage collector detects and frees them.
However, the garbage collector does not detect them immediately.
In fact, it sets a threshold and uses it to decide when to detect and free those circular reference objects.
Therefore, if the threshold is not reached, the unreleased memory may gradually consume all available memory, which is consistent with our observations.
There are several ways to address this memory issue:
- Call
gc.collect()
frequently to force the garbage collector to clean up circular references.
However, this can introduce significant overhead and degrade performance. - Manually break the reference in the
.ss
attribute (e.g.,C.ss = None
in our example) after the matrix is no longer needed. However, this is not a general or scalable solution, as it requires explicit intervention. - Replace the strong reference with
weakref
, which avoids reference cycles entirely.
Would replacing the strong reference withweakref
be a safe and effective solution in this case?
We were curious about the use of circular references in python-graphblas
, so we investigated it further.
Currently, we know that some functions in the Matrix
class depend on the .ss
attribute, and that the ss
object also needs to access information from the Matrix
itself (e.g., nrows
and ncols
).
However, there might be alternative ways to support this interaction without the circular reference?!
That said, we may be missing some context. Is there a specific reason or design requirement for using this circular reference structure?
Is it possible to refactor this and improve this memory behavior in python-graphblas
?
Thanks