Skip to content

Make matrix mod 2 conversion to numpy faster & some semantic fixes #39152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

user202729
Copy link
Contributor

@user202729 user202729 commented Dec 18, 2024

As in the title. Plus a few minor changes as needed.

Reuses numpy_util module from #38834 for the utility function…
in retrospect it might have been placed in the wrong place. (?)

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

@user202729 user202729 force-pushed the faster-to-numpy branch 2 times, most recently from 97b8c50 to bfb90b9 Compare December 18, 2024 06:43
@@ -303,7 +304,7 @@ def process_block(block, src_in_lines, file_optional_tags, venv_explainer=''):
got = re.sub(r'(doctest:warning).*^( *DeprecationWarning:)',
r'\1...\n\2',
got, 1, re.DOTALL | re.MULTILINE)
got = got.splitlines() # got can't be the empty string
got = textwrap.dedent(got).splitlines() # got can't be the empty string
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the fix made by the script would look like

sage: b = numpy.array(a); b
array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])

because each line is individually lstrip-ed. With the change it becomes

sage: b = numpy.array(a); b
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Copy link

github-actions bot commented Dec 18, 2024

Documentation preview for this PR (built with commit b4d4d08; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

"""
if copy is not _MISSING:
from sage.misc.superseded import deprecation
deprecation(39152, "passing copy argument to numpy() is deprecated")
Copy link
Contributor Author

@user202729 user202729 Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decide to deprecate this feature because:

  • copy=False was never supported in the first place (it always copies)
  • copy=False implicitly copies is incompatible with numpy 2.0 interface where np.array(..., copy=False) raises ValueError if a copy is made
  • copy=* doesn't work in numpy-based matrices (even after Fix matrix coercion with numpy 2.1 #38683 )
  • it seems dangerous to expose the internal array (which will change on mutation on the original object, which requires implementation to use the exact dtype otherwise user visible change will be seen)

20000
"""
from ..modules.numpy_util import mzd_matrix_to_numpy
return mzd_matrix_to_numpy(<uintptr_t>self._entries, dtype)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was the original plan (it overrides numpy() method of parent to provide a fast path)

return self._matrix_numpy.copy()
else:
return Matrix_dense.numpy(self, dtype=dtype)
return np.array(self._matrix_numpy, dtype=dtype)
Copy link
Contributor Author

@user202729 user202729 Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplification.

Also the old code made a call to generic Matrix_dense.numpy(self, dtype=dtype) method which is obviously much slower than this.

+ the old code documentation is in fact incorrect because the __array__ method of Matrix_numpy_dense class is not overridden, so the method is not called inside np.array(...). Of course it works because of the generic (slow) implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's wrong with flintlib/flint#2027 so that it "prevents testing" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nothing. I just mean that I need a version of flint after that pull request to allow testing, and I haven't gotten around to figure out how to install it from source yet (since latest version on conda-forge didn't have that pull request merged)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in Sage we have a workaround (some macro magic) for this Flint issue installed. (Flint still hasn't released a version with this fix merged)

@user202729 user202729 changed the title Make matrix mod 2 conversion to numpy faster Make matrix mod 2 conversion to numpy faster & some semantic fixes Dec 18, 2024
@user202729 user202729 marked this pull request as draft December 28, 2024 10:05
@user202729 user202729 marked this pull request as ready for review December 28, 2024 14:50
@user202729
Copy link
Contributor Author

I tried to install numpy 2.1.0 with pip (in conda environment) and monkey-patch flint headers to change the I.

The new tests pass, but when do a --all test there are a bunch of

    ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
    ImportError: numpy.core.multiarray failed to import
    AttributeError: _ARRAY_API not found

How do I install numpy 2.1.0 properly? ( @antonio-rojas ?)

It would be easier if we can just make it one of the checks for the CI, although it has significant overhead, especially on pull requests that doesn't touch the numpy part.

@dimpase
Copy link
Member

dimpase commented Dec 29, 2024

to allow unfixed flint, one can do a bit of C macro hackery (a branch doing this by mkoeppe is floating somewhere, I can't find it afk) - is it what you refer to as monkey-patching?

@user202729
Copy link
Contributor Author

user202729 commented Jan 1, 2025

@dimpase Do you know why fpylll doesn't work with numpy==2.0.0 (/ what's expected time frame for it to be supported / or if there's any way around it?)

@dimpase
Copy link
Member

dimpase commented Jan 1, 2025

no idea, but surely fpylll works with numpy 2.

vbraun pushed a commit to vbraun/sage that referenced this pull request Jan 1, 2025
sagemathgh-39219: Faster conversion from numpy array to matrix mod 2
    
As in the title. Estimated 50× speedup.


### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [x] I have created tests covering the changes.
- [x] I have updated the documentation and checked the documentation
preview.

(Reuses the `numpy_util` module from sagemath#38834 for the utility function)


Reverse direction: sagemath#39152
    
URL: sagemath#39219
Reported by: user202729
Reviewer(s): Travis Scrimshaw
vbraun pushed a commit to vbraun/sage that referenced this pull request Jan 3, 2025
sagemathgh-39219: Faster conversion from numpy array to matrix mod 2
    
As in the title. Estimated 50× speedup.


### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [x] I have created tests covering the changes.
- [x] I have updated the documentation and checked the documentation
preview.

(Reuses the `numpy_util` module from sagemath#38834 for the utility function)


Reverse direction: sagemath#39152
    
URL: sagemath#39219
Reported by: user202729
Reviewer(s): Travis Scrimshaw
@antonio-rojas
Copy link
Contributor

The new tests pass, but when do a --all test there are a bunch of

    ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
    ImportError: numpy.core.multiarray failed to import
    AttributeError: _ARRAY_API not found

How do I install numpy 2.1.0 properly? ( @antonio-rojas ?)

I know nothing about conda, but upgrading numpy requires a rebuild of everything that uses its C API.

@user202729
Copy link
Contributor Author

user202729 commented Jan 4, 2025

Nice, conda-forge rebuilt fpylll with numpy 2.0. Should be alright to just use conda-provided fpylll then. (https://github.com/conda-forge/fpylll-feedstock/pull/32)

@user202729
Copy link
Contributor Author

Alright, I tested this on numpy 2 and everything passes — with irrelevant failures fixed by #39242 .

@dimpase
Copy link
Member

dimpase commented May 18, 2025

Are matrices mod 2 the ones provided by m4ri? Or here you have some other mod 2 matrices?

@user202729
Copy link
Contributor Author

They are the m4ri matrix. See mzd_matrix_to_numpy was used.

@dimpase
Copy link
Member

dimpase commented May 21, 2025

By the way, I have been thinking about splitting m4ri interface into a separate python package, but then the question is to how to setup efficient bit arrays. It appears to me that numpy uint8 arrays are good for this, so the Python side of the interface would be the latter, and then Sage's matrices can be bolted on them.
Would you think this can be done without loss of efficiency?

@user202729
Copy link
Contributor Author

user202729 commented May 21, 2025

I haven't looked much at whether that is possible. But then numpy itself is a separate package anyway, right?

A problem I can foresee is that existing programs that make use of from sage.something cimport something (probably %%cython functions in user code, for performance) will break. It might be able for them to change it to from sage_m4ri cimport something etc., but that sounds clunky enough.

@dimpase
Copy link
Member

dimpase commented May 22, 2025

I haven't looked much at whether that is possible. But then numpy itself is a separate package anyway, right?

sure, but it's much smaller and trivial to install, so a Python package depending on numpy is unproblematic.

A problem I can foresee is that existing programs that make use of from sage.something cimport something (probably %%cython functions in user code, for performance) will break. It might be able for them to change it to from sage_m4ri cimport something etc., but that sounds clunky enough.

This won't change - what would change in Sage is the code which reads/writes m4ri data, this data would become numpy arrays, and calling m4ri functions will be wrapped differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy