Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: support rng for summary_plot #3945

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

tylerjereddy
Copy link

  • Some folks on my team are having issues writing tests downstream of shap because summary_plot mutates the NumPy legacy random machinery global state, which is hard to work around in a large testsuite. It would be great if a heavily-used project in the ML space like shap could start to adopt support for local random Generator objects per the community document at https://scientific-python.org/specs/spec-0007/. This PR is a lot cruder than that approach, but appears to allow preservation of the current global state behavior while allowing the option of using the modern non-global approach for downstream developers that need it.

  • This patch adds a regression test that fails when not using the optional rng argument, and passes when it is used to scope to a local random state. If there is a preference for using the full SPEC 7 approach (and that may very well be the best idea), I may ask one of my team members to help out a bit, since they will benefit from it.

  • The full testsuite passed locally via pytest --import-mode=append

Maybe I'll cc @mdhaber @tupui -- not expecting a review, but because they also work in stats/ML space and were involved in the SPEC.

Checklist

  • All pre-commit checks pass.
  • Unit tests added (if fixing a bug or adding a new feature)

Copy link
Collaborator

@CloseChoice CloseChoice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR, this looks sensible to me. I just need to think whether we'll actually want to introduce the keyword that might become redundant once we implement a Generator version for the rng. On the other hand, this is just an optional parameter but please give me some time to read into this and come back to you.

Also about the tests: these fail in all pipelines and are not caused by your changes.

@mdhaber
Copy link

mdhaber commented Jan 14, 2025

Thanks for the ping. I suppose if I'm here, I'd suggest using one of the legacy names for the keyword unless it is going to follow the SPEC 7 plan. If this keyword is intended to be permanent, a different name might be helpful to signal different behavior from the ecosystem standard; if not (if the library wants to adopt the standard some day, but not yet), it's easier to replace the keyword than to change an existing keyword's behavior in backyard compatible way.

Copy link

codecov bot commented Jan 15, 2025

Codecov Report

Attention: Patch coverage is 66.66667% with 5 lines in your changes missing coverage. Please review.

Project coverage is 64.68%. Comparing base (a2bad17) to head (82ed3f3).

Files with missing lines Patch % Lines
shap/plots/_beeswarm.py 66.66% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3945      +/-   ##
==========================================
+ Coverage   64.67%   64.68%   +0.01%     
==========================================
  Files          92       92              
  Lines       12862    12873      +11     
==========================================
+ Hits         8318     8327       +9     
- Misses       4544     4546       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tylerjereddy and others added 3 commits January 27, 2025 12:52
* Some folks on my team are having issues writing tests
downstream of `shap` because `summary_plot` mutates
the NumPy legacy random machinery global state, which is
hard to work around in a large testsuite. It would be
great if a heavily-used project in the ML space like `shap`
could start to adopt support for local random `Generator`
objects per the community document at:
https://scientific-python.org/specs/spec-0007/
This PR is a lot cruder than that approach, but appears
to allow preservation of the current global state behavior
while allowing the option of using the modern non-global
approach for downstream developers that need it.

* This patch adds a regression test that fails when not
using the optional `rng` argument, and passes when
it is used to scope to a local random state. If there
is a preference for using the full SPEC 7 approach
(and that may very well be the best idea), I may ask
one of my team members to help out a bit, since they
will benefit from it.

* The full testsuite passed locally via `pytest --import-mode=append`
* Ignore some `mypy` false positives in
  `test_summary_plot_seed_insulated`

* Switch from `rng` to `seed` argument name for `summary_legacy()`
based on reviewer feedback, to leave an easier route open to SPEC 7.
@tylerjereddy tylerjereddy force-pushed the treddy_summary_plot_rng_non_global branch from d46a1e8 to 5671ab2 Compare January 27, 2025 20:31
@tylerjereddy
Copy link
Author

@CloseChoice @connortann I've revised in the following ways, let me know if you'd like any other changes:

  • I ignored two testsuite mypy complaints related to NumPy get_state(), which may have caused confusion because it has a return type that is variable based on a modern or legacy mode, and mypy picked the incorrect one for whatever reason
  • Based on Matt's feedback, I switched away from the rng keyword to make it easier for the shap team to properly adopt SPEC 7 in the future if that ever becomes desirable. I went with seed instead, though it is admittedly slightly annoying to have an argument called seed that doesn't accept an integer and only a Generator. We could try to add support for integer as well, although scope creep perhaps...
  • I rebased on latest master and pytest --import-mode=append still seems happy locally at least

@connortann
Copy link
Collaborator

connortann commented Jan 28, 2025

Thank you @tylerjereddy for the PR and for bringing SPEC 7 to our attention. It is fantastic to have you as a contributor.

I think it would make sense for shap to move towards adopting SPEC 7, and I'm happy to work on making that happen (related: #3980). The good news is that we don't have an existing seed argument to handle (in this case), so the transition should be a bit simpler than in the reference implementation in the SPEC.

I think we should introduce the new rng parameter immediately, rather than seed which to me suggests an integer. I like your suggestion of maintaining backwards-compatibility for now. How is this as a transition plan:

Initially:

  • Add the new rng parameter, which if provided will be normalised as per SPEC 7.
  • If the global seed is set, raise a FutureWarning that the future behaviour will change

Then in future:

  • Remove use of the global state, and just use rng = np.random.default_rng(rng).

The initial change could be implemented as:

def my_func(*, rng=None):

    if rng is not None:
        # If rng argument is provided, normalise as per SPEC 7
        rng = np.random.default_rng(rng)
    else:
        # Otherwise, maintain backwards compatibility for now, raising a warning if the global seed was set
        global_seed_set = np.random.mtrand._rand._bit_generator._seed_seq is None
        if global_seed_set:
            msg = (
                "The NumPy global RNG was seeded by calling `np.random.seed`. "
                "In a future version this function will no longer use the global RNG. "
                "Pass `rng` explicitly to opt-in to the new behaviour and silence this warning."
            )
            warnings.warn(msg, FutureWarning, stacklevel=2)

    # Example use
    inds = np.arange(10)
    if rng is None:
        # For now, maintain backwards compatibility.
        np.random.shuffle(inds)
    else:
        rng.shuffle(inds)
   

Then at some point in future we'd just have:

def my_func(*, rng=None):
    rng = np.random.default_rng(rng)
    
    inds = np.arange(10)
    rng.shuffle(inds)

PS - @tylerjereddy you might be interested in the contributor discussion #3559

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy