ENH: support rng for summary_plot #3945

tylerjereddy · 2025-01-03T20:01:03Z

Some folks on my team are having issues writing tests downstream of shap because summary_plot mutates the NumPy legacy random machinery global state, which is hard to work around in a large testsuite. It would be great if a heavily-used project in the ML space like shap could start to adopt support for local random Generator objects per the community document at https://scientific-python.org/specs/spec-0007/. This PR is a lot cruder than that approach, but appears to allow preservation of the current global state behavior while allowing the option of using the modern non-global approach for downstream developers that need it.
This patch adds a regression test that fails when not using the optional rng argument, and passes when it is used to scope to a local random state. If there is a preference for using the full SPEC 7 approach (and that may very well be the best idea), I may ask one of my team members to help out a bit, since they will benefit from it.
The full testsuite passed locally via pytest --import-mode=append

Maybe I'll cc @mdhaber @tupui -- not expecting a review, but because they also work in stats/ML space and were involved in the SPEC.

Checklist

All pre-commit checks pass.
Unit tests added (if fixing a bug or adding a new feature)

CloseChoice

Thanks a lot for the PR, this looks sensible to me. I just need to think whether we'll actually want to introduce the keyword that might become redundant once we implement a Generator version for the rng. On the other hand, this is just an optional parameter but please give me some time to read into this and come back to you.

Also about the tests: these fail in all pipelines and are not caused by your changes.

mdhaber · 2025-01-14T22:20:47Z

Thanks for the ping. I suppose if I'm here, I'd suggest using one of the legacy names for the keyword unless it is going to follow the SPEC 7 plan. If this keyword is intended to be permanent, a different name might be helpful to signal different behavior from the ecosystem standard; if not (if the library wants to adopt the standard some day, but not yet), it's easier to replace the keyword than to change an existing keyword's behavior in backyard compatible way.

codecov · 2025-01-15T19:18:33Z

Codecov Report

Attention: Patch coverage is 66.66667% with 5 lines in your changes missing coverage. Please review.

Project coverage is 64.68%. Comparing base (a2bad17) to head (82ed3f3).

Files with missing lines	Patch %	Lines
shap/plots/_beeswarm.py	66.66%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3945      +/-   ##
==========================================
+ Coverage   64.67%   64.68%   +0.01%     
==========================================
  Files          92       92              
  Lines       12862    12873      +11     
==========================================
+ Hits         8318     8327       +9     
- Misses       4544     4546       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

* Some folks on my team are having issues writing tests downstream of `shap` because `summary_plot` mutates the NumPy legacy random machinery global state, which is hard to work around in a large testsuite. It would be great if a heavily-used project in the ML space like `shap` could start to adopt support for local random `Generator` objects per the community document at: https://scientific-python.org/specs/spec-0007/ This PR is a lot cruder than that approach, but appears to allow preservation of the current global state behavior while allowing the option of using the modern non-global approach for downstream developers that need it. * This patch adds a regression test that fails when not using the optional `rng` argument, and passes when it is used to scope to a local random state. If there is a preference for using the full SPEC 7 approach (and that may very well be the best idea), I may ask one of my team members to help out a bit, since they will benefit from it. * The full testsuite passed locally via `pytest --import-mode=append`

for more information, see https://pre-commit.ci

* Ignore some `mypy` false positives in `test_summary_plot_seed_insulated` * Switch from `rng` to `seed` argument name for `summary_legacy()` based on reviewer feedback, to leave an easier route open to SPEC 7.

for more information, see https://pre-commit.ci

tylerjereddy · 2025-01-27T20:38:41Z

@CloseChoice @connortann I've revised in the following ways, let me know if you'd like any other changes:

I ignored two testsuite mypy complaints related to NumPy get_state(), which may have caused confusion because it has a return type that is variable based on a modern or legacy mode, and mypy picked the incorrect one for whatever reason
Based on Matt's feedback, I switched away from the rng keyword to make it easier for the shap team to properly adopt SPEC 7 in the future if that ever becomes desirable. I went with seed instead, though it is admittedly slightly annoying to have an argument called seed that doesn't accept an integer and only a Generator. We could try to add support for integer as well, although scope creep perhaps...
I rebased on latest master and pytest --import-mode=append still seems happy locally at least

connortann · 2025-01-28T12:27:15Z

Thank you @tylerjereddy for the PR and for bringing SPEC 7 to our attention. It is fantastic to have you as a contributor.

I think it would make sense for shap to move towards adopting SPEC 7, and I'm happy to work on making that happen (related: #3980). The good news is that we don't have an existing seed argument to handle (in this case), so the transition should be a bit simpler than in the reference implementation in the SPEC.

I think we should introduce the new rng parameter immediately, rather than seed which to me suggests an integer. I like your suggestion of maintaining backwards-compatibility for now. How is this as a transition plan:

Initially:

Add the new rng parameter, which if provided will be normalised as per SPEC 7.
If the global seed is set, raise a FutureWarning that the future behaviour will change

Then in future:

Remove use of the global state, and just use rng = np.random.default_rng(rng).

The initial change could be implemented as:

def my_func(*, rng=None):

    if rng is not None:
        # If rng argument is provided, normalise as per SPEC 7
        rng = np.random.default_rng(rng)
    else:
        # Otherwise, maintain backwards compatibility for now, raising a warning if the global seed was set
        global_seed_set = np.random.mtrand._rand._bit_generator._seed_seq is None
        if global_seed_set:
            msg = (
                "The NumPy global RNG was seeded by calling `np.random.seed`. "
                "In a future version this function will no longer use the global RNG. "
                "Pass `rng` explicitly to opt-in to the new behaviour and silence this warning."
            )
            warnings.warn(msg, FutureWarning, stacklevel=2)

    # Example use
    inds = np.arange(10)
    if rng is None:
        # For now, maintain backwards compatibility.
        np.random.shuffle(inds)
    else:
        rng.shuffle(inds)

Then at some point in future we'd just have:

def my_func(*, rng=None):
    rng = np.random.default_rng(rng)
    
    inds = np.arange(10)
    rng.shuffle(inds)

PS - @tylerjereddy you might be interested in the contributor discussion #3559

CloseChoice reviewed Jan 3, 2025

View reviewed changes

tylerjereddy and others added 3 commits January 27, 2025 12:52

[pre-commit.ci] auto fixes from pre-commit.com hooks

e7edffe

for more information, see https://pre-commit.ci

MAINT: PR 3945 revisions

5671ab2

* Ignore some `mypy` false positives in `test_summary_plot_seed_insulated` * Switch from `rng` to `seed` argument name for `summary_legacy()` based on reviewer feedback, to leave an easier route open to SPEC 7.

tylerjereddy force-pushed the treddy_summary_plot_rng_non_global branch from d46a1e8 to 5671ab2 Compare January 27, 2025 20:31

[pre-commit.ci] auto fixes from pre-commit.com hooks

82ed3f3

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: support rng for summary_plot #3945

ENH: support rng for summary_plot #3945

tylerjereddy commented Jan 3, 2025

CloseChoice left a comment

mdhaber commented Jan 14, 2025

codecov bot commented Jan 15, 2025 •

edited

Loading

tylerjereddy commented Jan 27, 2025

connortann commented Jan 28, 2025 •

edited

Loading

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

ENH: support rng for summary_plot #3945

Are you sure you want to change the base?

ENH: support rng for summary_plot #3945

Conversation

tylerjereddy commented Jan 3, 2025

Checklist

CloseChoice left a comment

Choose a reason for hiding this comment

mdhaber commented Jan 14, 2025

codecov bot commented Jan 15, 2025 • edited Loading

Codecov Report

tylerjereddy commented Jan 27, 2025

connortann commented Jan 28, 2025 • edited Loading

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

codecov bot commented Jan 15, 2025 •

edited

Loading

connortann commented Jan 28, 2025 •

edited

Loading